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Abstract 

Consider N xN Hermitian or symmetric random matrices H where the distribution of the (i, j) matrix 
element is given by a probability measure Vij with a subexponential decay. Let af 3 - be the variance for 
the probability measure Vij with the normalization property that o'jj = 1 for all j. Under essentially 
the only condition that c < Nah < c _1 for some constant c > 0, we prove that, in the limit N — > oo, 
the eigenvalue spacing statistics of H in the bulk of the spectrum coincide with those of the Gaussian 
unitary or orthogonal ensemble (GUE or GOE). We also show that for band matrices with bandwidth 
M the local semicircle law holds to the energy scale M _1 . 
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1 Introduction 



One key universal quantity for random matrices is the eigenvalue gap distribution. Although the density 
of eigenvalues may depend on the specific model, the gap distribution or the short distance correlation 
function are believed to depend only on the symmetry class of the ensembles but are otherwise independent 
of the details of the distributions. There are two types of universality: the edge universality and the bulk 
universality. In this paper, we will focus on the bulk universality concerning the interior of the spectrum. The 
bulk universality was proved for very general classes of invariant ensembles (see, e.g. [3, 6, 7, 8, 9, 25, 26, 27] 
and references therein). For non- invariant ensembles, in particular for matrices with i.i.d. entries (Wigner 
matrices), the bulk universality was difficult to establish due to the lack of an explicit expression for the 
joint distribution of the eigenvalues. 

The first rigorous partial result for bulk universality in the non-unitary case was given by Johansson [23] 
(see also Ben Arous and Pcche [2] and the recent improvement [24] ) stating that the bulk universality holds 
for Gaussian divisible Hermitian ensembles, i.e., Hermitian ensembles of the form 

H + sV, (1.1) 

where H is a Wigner matrix, V is an independent standard GUE matrix and s is a positive constant of 
order one. The restriction on Gaussian divisibility turned out to be very difficult to remove. In a series of 
papers [12, 13, 14, 17], we developed a new approach to prove the universality. The first step was to derive 
the local semicircle law, an estimate of the local eigenvalue density, down to energy scales containing around 
log N eigenvalues. Once such a strong form of the local semicircle law was obtained, the result of [23, 2] can 
be extended to a Gaussian convolution with variance only ,s 2 x N~ 1+£ . This tiny Gaussian component can 
then be removed via a reverse heat flow argument and this proves [17] the bulk universality for Hermitian 
ensembles provided that the distributions of the matrix elements are sufficiently diffcrcntiable. 

The bulk universality for Hermitian ensembles was also proved later on by Tao and Vu [31] under the 
condition that the first four moments of the matrix elements match those of GUE, but without the differen- 
tiability assumption. The condition on the fourth moment was already removed in [31] by using the result 
for Gaussian divisible ensembles of [23, 2]; the third moment condition was then removed in [18] by using 
the result of [17]. 

The four moment theorem [31] is also valid for the symmetric ensembles, but the restriction on the 
matching of the first four moments cannot be weakened for the following reason. The key input to remove 
the fourth moment matching condition for the Hermitian case, the universality of the Gaussian divisible 
ensembles [23, 2], relied entirely on the asymptotic analysis of an explicit formula, closely related to a 
formula in Brczin-Hikami [5, 23], for the correlation functions of the eigenvalues for the Hermitian ensembles 
H + sV . Since similar formulas for symmetric matrices are very complicated, the corresponding result is 
not available and thus the matching of the fourth moment cannot be removed in this way. Although there 
is a proof [16] of universality for s 2 > iV -3 / 4 without using this formula, the main ingredient of that proof, 
establishing the uniqueness of the local equilibria of the Dyson Brownian motion, still heavily used explicit 
formulas related to GUE. 

In [15] a completely different strategy was introduced based on a local relaxation flow, which locally 
behaves like a Dyson Brownian motion, but has a faster decay to equilibrium. This approach entirely 
eliminates explicit formulas and it gives a unified proof for the universality of symmetric and Hermitian 
Wigner matrices [15]. It was further generalized [19] to quaternion self-dual Wigner matrices and sample 
covariance matrices. The method not only applies to all these specific ensembles, but it also gives a conceptual 
interpretation that the occurrence of the universality is due to the relaxation to local equilibrium of the DBM. 
We remark that very recently the results of [31] were also extended to sample covariance matrices [33]. 
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The main input of all these methods [17, 15, 19] and [31, 33] is an estimate of the local density of 
eigenvalues, the local semicircle law. This has been developed in the previous work on Wigner matrices 
[12, 13, 14], where the matrix elements were i.i.d. random variables. In this paper, we extend this method 
to random matrices with independent, but not necessarily identically distributed entries. If we denote the 
variance of the entry of the matrix by Oy, our main interest is the case that Uij are not a constant but 
they satisfy the normalization condition crfj = 1 for all j. We will call such matrix ensembles universal 
Wigner matrices. For these ensembles Guionnet [21] and Andcrson-Zcitouni [1] proved that the density of 
the eigenvalues converges to the Wigner semi-circle law. The simplest case is that of generalized Wigner 
matrices, where Nafj is uniformly bounded from above and below by two fixed positive numbers. In this 
case, we prove the local semicircle law down to essentially the smallest possible energy scale iV _1 (modulo 
log TV factors). A much more difficult case is the Wigner band matrices where, roughly speaking, afj = 
if \i — j\ > M for some M < N. In this case, we obtain the local semicircle law to the energy scale M _1 . 
We note that a certain three-dimensional version of Gaussian band matrices was considered by Discrtori, 
Pinson and Spencer [10] using the supcrsymmetric method. They proved that the expectation of the density 
of eigenvalues is smooth and it coincides with the Wigner semicircle law. 

With the local semicircle law proved up to the almost optimal scale, applying the method of [15, 19] leads 
to the identification of the correlation functions and the gap distribution for generalized Wigner matrices 
provided that the distribution of the matrix elements is continuous and satisfies the logarithmic Sobolcv 
inequality. These additional assumptions can be removed if one can extend the Tao-Vu theorem [31] to 
generalized Wigner matrices. In Section 8, we will introduce an approach based on a Green's function 
comparison theorem, which states that the joint distributions of Green's functions of two ensembles at 
different energies with imaginary parts of order 1 /N are identical provided that the first three moments of 
the two ensembles coincide and the fourth moments arc close. Since local correlation functions and the gap 
distribution of the eigenvalues can be identified from Green's functions, it follows that the local correlation 
functions of these two ensembles are identical at the scale 1/N. We can thus use this theorem to remove all 
continuity and logarithmic Sobolev inequality restrictions in our approach. In particular, this leads to the 
bulk universality for generalized Wigner matrices with the subcxponential decay being essentially the only 
assumption on the probability law. We note that one major technical difficulty in [31], the level repulsion 
estimate, is not needed in the proof of the Green's function comparison theorem. It will be clear in Section 
8 that, once the local semicircle law is established, the Green's function comparison theorem is a simple 
consequence of the standard resolvent perturbation theory. 

2 Main results 

We now state the main results of this paper. Since all our results hold for both Hermitian and symmetric 
ensembles, we will state the results for Hermitian matrices only. The modifications to the symmetric case 
are straightforward and they will be omitted. Let H = (hij)fj =l be an N x N Hermitian matrix where 
the matrix elements h^ = hji, i < j, are independent random variables given by a probability measure Vij 
with mean zero and variance afj. The variance of hij for i > j is afj = E \ hij\ 2 = er^. For simplicity of the 
presentation, we assume that for any fixed 1 < i < j < N , Re hij and Im hij are i.i.d. with distribution u>ij 
i.e., — u>ij ®LOij in the sense that Vij(dh) = Wij (dRe h)cuij (dim h), but this assumption is not essential for 
the result. The distribution i>ij and its variance of- may depend on N, but we suppress this in the notation. 
We assume that, for any j fixed, 




(2.1) 
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Matrices with independent, zero mean entries and with the normalization condition (2.1) will be called 
universal Wigner matrices. For a forthcoming review on this matrix class, see [29], where the terminology 
of random band matrices was used. 
Define Ci n t and C sup by 

C mf := inf {NafA < sup {Mr?,} =: C sup . (2.2) 

M,i,3 N,i,j 

Note that Ci n f = C sup corresponds to the standard Wigner matrices and the condition < Ci„f < C sup < oo 
defines more general Wigner matrices with comparable variances. 

We will also consider an even more general case when cry for different indices are not comparable. 
The basic parameter of such matrices is the quantity 

M := — T . (2.3) 

A special case is the band matrix, where cry = for \i — j\ > W with some parameter W. In this case, M 
and W are related by M < CW. 

Denote by B := {cr^}fj =1 the matrix of variances which is symmetric and doubly stochastic by (2.1), in 
particular it satisfies —1<B<1. Let the spectrum of B be supported in 

Spec(B) c [-l + <S_,l-<5+]U{l} (2.4) 

with some nonnegative constants S±. We will always have the following spectral assumption 

1 is a simple eigenvalue of B and 5- is a positive constant, independent of N . (2-5) 

The local semicircle law will be proven under this general condition, but the precision of the estimate near 
the spectral edge will also depend on 8+ in an explicit way. For the orientation of the reader, we mention 
two special cases of universal Wigner matrices that provided the main motivation for our work. 

Example 1. Generalized Wigner matrix. In this case we have 

< C inf < C sup < oo, (2.6) 
and one can easily prove that 1 is a simple eigenvalue of B and (2.4) holds with 

S± > C inf , (2.7) 
i.e., both 5- and 5+ are positive constants independent of N. 
Example 2. Band matrix. The variances are given by 

4 = ^- 1 /(^), (2.8) 

where W > 1, / : M — > M+ is a bounded nonnegative symmetric function with J f = 1 and we defined 
[i — j\n € Z by the property that [i — j]jv = i — j mod N and — < [i — < \N . Note that the 
relation (2.1) holds only asymptotically as W — > oo but this can be remedied by an irrelevant rescaling. If 
the bandwidth is comparable with TV, then we also have to assume that f(x) is supported in \x\ < N/(2W). 
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The quantity M denned in (2.3) satisfies M < WV||/||oo- In Appendix A we will show that (2.5) is satisfied 
for the choice of (2.8) if W is large enough. 

The Stieltjes transform of the empirical eigenvalue distribution of H is given by 

m(z) = m N (z) = — Tr — — - , z = E + irj. (2.9) 



We define the density of the semicircle law 



g sc (x) := l-^[4~x 2 } +1 (2.10) 



and, for Imz > 0, its Stieltjes transform 

m sc (z) := [ ^±dx. (2.11) 

Jr x z 

The Stieltjes transform m sc (z) = m sc may also be characterized as the unique solution of 

m sc H = (2.12) 

z + m sc 

satisfying Imm sc (z) > for Imz > 0, i.e., 



-z + Vz 2 -4 

m sc {z) = . (2.13) 

Here the square root function is chosen with a branch cut along the positive real axis. This guarantees that 
the imaginary part of m sc is non- negative. The Wigner semicircle law states that mjv(z) — > m sc (z) for any 
fixed z provided that rj = Im z > is independent of N. The local version of this result for universal Wigner 
matrices is the content of the following Theorem. 

Theorem 2.1 (Local semicircle law) Let H = (hij) be a Hermitian N x TV random matrix where the 
matrix elements hij = hji, i < j, are independent random variables with E/i^j = 0, 1 < i,j < N, and 
assume that the variances afj = E|/t,jj| 2 satisfy (2.1), (2.4) and (2.5). Suppose that the distributions of the 
matrix elements have a uniformly subexponential decay in the sense that there exist constants a, (3 > 0, 
independent of N , such that for any x > we have 

n\^j\>x a \a rj \)<j3e-\ (2.14) 

Then there exist constants G\, C2, C and c > 0, depending only on a, fj and 5- in (2.5), such that for any 
z = E + irj with rj = Im z > 0, \z\ < 10 and 

1 K 2 

< 7TZ^c7> ( 2 - 15 ) 



^IIT] ~ (log/V) c i 

where k := \E\ — 2 , the Stieltjes transform of the empirical eigenvalue distribution of H satisfies 



(\m N (z) - m sc (z)\ > (\ogN) c -^^-) < CN^s^N) (2 lg) 

\ v M rj k ) 
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for sufficiently large N . In fact, the same result holds for the individual matrix elements of the Green's 
function Gu(z) = (H — z)~ l {i,i): 

P (max|G«(«) -m 3c (z)\ > (l og N) c *^=-) < CN- c ^ lo ^ N \ (2.17) 

We remark that once a local semicircle law is obtained on a scale essentially M — 1 , it is straightforward 
to show that eigenvectors are delocalized on a scale at least of order M. The precise statement will be 
formulated in Corollary 3.2. We will prove Theorem 2.1 in Sections 3-5 by extending the approach of 
[12, 13, 14]. The main ingredients of this approach consist of i) a derivation of a self-consistent equation for 
the Green's function and ii) an induction on the scale of the imaginary part of the energy. The key novelty 
in this paper is that the self-consistent equation is formulated for the array of the diagonal elements of the 
Green's function (Gu, G22, • ■ • , Gnn) instead of the Stieltjes transform m = -^TrG = ^ Gu itself as in 
[14]. This yields for the first time a strong pointwise control on the diagonal elements Gu, see (2.17). 

The subcxponential decay condition (2.14) can be weakened if we are not aiming at error estimates faster 
than any power law of N. This can be easily carried out and we will not pursue it in this paper. 

Denote the eigenvalues of H by Ai,...,Ajy and let p^(xi, . . . , xpj) be their (symmetric) probability 
density. For any k = 1, 2, . . . , N, the fc-point correlation function of the eigenvalues is defined by 

(xi,x 2 , ■ ■ ■ ,x k ) ■= / p N (xi,X2,...,x N )dx k +i--.dx N . (2-18) 

JR N ~ k 

We now state our main result concerning these correlation functions. 

Theorem 2.2 (Universality for generalized Wigner matrices) We consider a generalized hermitian 
Wigner matrix such that (2.6) holds. Assume that the distributions Vij of the matrix elements have 

a uniformly subexponential decay in the sense of (2.14). Suppose that the real and imaginary parts of hij 
are i.i.d., distributed according to LOij, i.e., Vij{dh) = u>ij(dImh)uJij(dReh). Let m,k(i,j) = J x k duJij(x), 
1 < k < 4, denote the k-th moment of (mi — 0). Suppose that 

mf mm < - > > 1, (2.19) 

N i<t,j<N {(m 2 {i,j)) 2 (m 2 (i,j)) 3 J 

then, for any k > 1 and for any compactly supported continuous test function O : M. k — > R, we have 

lim lim — / d£' / da x . . . da k 0{a\, ...,a k ) 
&-s-ojv->oo 2b J E _ b J R k 



( (fe) _ (fe) \(j-,i eti p , 

g sc (E) k PGUE < N ) ^ + Ng sc (E) ' ' ' " ' ^ ' Ng sc (E) 



a k 



(2.20) 



(k) 

where p G jj E N is the k-point correlation function of the GUE ensemble. The same statement holds for 
generalized symmetric Wigner matrices, with GOE replacing the GUE ensemble. 

The limiting correlation functions of the GUE ensemble are given by the sine kernel 

1 (k) ( ^ eti ,_, oik \ r T w ,,t . . sin7ra; 

■^ P <**A E + Ng-tE-y -- E + Wg-jE)) "> " ^ )} ^= 1 ' K ® = 
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and similar universal formula is available for the limiting gap distribution. 



Remark: The quantity in the bracket in (2.19) is always greater or equal to 1 for any real distribution 
with mean zero, which can be obtained by 



m\ = \ I x 3 dcuY = [ I x{x 2 — m2)dw] 2 < [ / x 2 dw] [ / (x 2 — m2) 2 dw] = 7712(7714 — m 2 ) 



and it is exactly 1 if the distribution is supported on two points. For example, if o;,j is a rescaling of a 
fixed distribution w with variance \, i.e. uiij(x)dx = a~^uj{x j a ij)dx , then condition (2.19) is satisfied under 
(2.6), as long as the support of lu consists of at least three points. The case of a Bernoulli-type distribution 
supported on two points require a separate argument and it will be treated in the forthcoming paper [20] . 

We now state our main comparison theorem for matrix elements of Green's functions of two Wigncr 
ensembles. As in the paper [31], we assume conditions on four moments. It will lead quickly to Theorem 6.4 
stating that the correlation functions of eigenvalues of two matrix ensembles are identical up to scale 1/N 
provided that the first four moments of all matrix elements of these two ensembles are almost identical. Here 
we do not assume that the real and imaginary parts are i.i.d., hence the fc-th moment of hij is understood 
as the collection of numbers J h s h k ~ s Vij(dh), s = 0, 1, 2, . . . , k. The main result in [31] compares the joint 
distribution of individual eigenvalues — which is not covered by our Theorem 2.3 — but it does not address 
directly the matrix elements of Green's functions. The key input for both theorems is the local semicircle law 
on the almost optimal scale N~ 1+£ . The eigenvalue perturbation used in [31] requires certain estimates on 
the eigenvalue level repulsion; the proof of Theorem 2.3 is a straightforward resolvent perturbation theory. 

Theorem 2.3 (Green's function comparison) Suppose that we have two generalized N x N Wigner 
matrices, and H^ w \ with matrix elements hij given by the random variables N~ x / 2 Vij and N~ 1 / 2 Wij , 

respectively, with Vij and Wij satisfying the uniform subexponential decay condition 

P(M > x a ) < /3e~ x , P(K | > x a ) < pe- x , 

with some a, ft > 0. Fix a bijective ordering map on the index set of the independent matrix elements, 

<P : :l<i<j<N}^{l,.. . , 7 (iV)}, 7 (JV) := N< ^ N + 1 \ 

and denote by H 1 the generalized Wigner matrix whose matrix elements hij follow the v- distribution if 
0(*:j) < 7 an d they follow the w- distribution otherwise; in particular = Hq and H^ w ' = H 1 ^y Let 
k > be arbitrary and suppose that, for any small parameter r > and for any y > N~ 1+T , we have the 
following estimate on the diagonal elements of the resolvent 



max max max 

\0<7<7(jV) l<fc<AT \E\<2-f 



1 



H 1 -E-iy / kk 



< N 2t ) > 1 - Cjv- closlogAr (2.21) 



with some constants C,c depending only on t,k. Moreover, we assume that the first three moments of Vi 



and are the same, i. e. 



mijvi=mijwi, o< s + M <3, 



and the difference between the fourth moments of and is much less than 1, say 

\Ev!jV^ s - E^<- s I < N~ s , s = 0, 1, 2, 3, 4, (2.22) 
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for some given S > 0. Let e > be arbitrary and choose an r\ with N- 1 -" < 77 < N' 1 . For any sequence 
of positive integers ki, ■ ■ . , k n , set complex parameters z™ = E™ ± in, j = 1, . . . k m , m = 1, . . . ,n, with 
\E™\ < 2 — 2k and with an arbitrary choice of the ± signs. Let G (v \z) = (H^> - z)' 1 denote the resolvent 
ana ) be a function such that for any multi-index a = (ai, . . . , a n ) with 1 < |a| < 5 and for 

any e' > sufficiently small, we have 



and 



max ^\d a F( Xl ,...,x n )\ : m&x\xj\ < 7V e '| < N C ° E ' 
max ||9 q F(xi,...,.t„)| : maxlxjl < iV 2 | < iV Co 



(2.23) 



(2.24) 



for some constant Co- 

Then, there is a constant C\, depending on a, f3, ^ m k m and Cq such that for any n with N" 1 " 6 < n < 
N~ x and for any choices of the signs in the imaginary part of z™ , we have 



EF 



1 



N k 



■Tr 



■Tr 



jG {v) (zJ) 

3=1 



- EF (g (v) -> G ( " 
<CiiV- 1 / 2 + Cie + Ci7V^ +Cl£ , 



(2.25) 



where the arguments of F in the second term are changed from the Green's functions of to and 
all other parameters remain unchanged. 

Remark 1: We formulated Theorem 2.3 for functions of traces of monomials of the Green's function 
because this is the form we need in the application. However, the result (and the proof we are going to 
present) holds directly for matrix elements of monomials of Green's functions as well, namely, for any choice 
of l\, ... , l^m w( 3 have 



EF 



1 



3=1 



jG^(z^) 

3=1 



EF ( G {v) G (l 



(2.26) 



We also remark that Theorem 2.3 holds for generalized Wigner matrices since C sup < 00 in (2.2). The 
positive lower bound on the variances, Ci n f > 0, is not necessary for this theorem. 

Remark 2: Although we state Theorem 2.3 for Hcrmitian and symmetric ensembles, similar results hold 
for real and complex sample covariance ensembles; the modification of the proof, to be given in Section 8, is 
obvious and we omit the details. 



To summarize, our approach to prove the universality is based on the following three steps; a detailed 
outline will be given in Section 6. Step 1. Local semicircle law, i.e., Theorem 2.1. This will be proved 
in Sections 3-5. Step 2. Universality for ensembles with smooth distributions satisfying the logarithmic 
Sobolev inequality (LSI), Theorem 6.3. The key input is the general theorem, Theorem 6.2, concerning the 
universality for the local relaxation flow. In Section 7, by using the local semicircle law and the LSI, we 
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verify the assumptions for this theorem. Step 3. Green's function comparison theorem, Theorem 2.3. This 
removes the restriction on the smoothness and the LSI, and it will be proved in Section 8. 

Convention. We will frequently use the notation C, c for generic positive constants whose exact values 
are irrelevant and may change from line to line. For two positive quantities A, B we also introduce the 
notation Ax B to indicate that there exists a universal constant C such that C _1 < A/B < C. 



3 Proof of local semicircle law 

Proof of Theorem 2.1 Recall that Gij = Gij(z) denotes the matrix element 



Ga = ( ) 



s H-z 
and 

N 

m{z) = m N (z) = N^Guiz). 

i=l 

We will prove the following more detailed stronger result. 

Theorem 3.1 Assume the N x N random matrix H satisfies (2.1), (2.4), (2.5) and (2.14), E/i^ = 0, for 
any 1 < i, j < N. Let z = E + it] (rj > 0) and let g(z) be the real valued function defined by 

g(z) = min + ^ max{<5+, |Re [m sc {z) 2 \ — l| } 1, (3.2) 

where K = \\E\ — 2| and 6 + is given in (2.4). Then for all z = E + in and 

(k 4- 77 s ! 1 / 4 

< (log AO" 13 - 6 ", M<10, (3.3) 



we have 

p(max|G ll (z)-m sc (z)| > (logjV) 11+to ^±S^- \ < C N- C ^^ N ^ (3.4) 

for sufficiently large N, with positive c and C > depending only a and f3 in (2.14) and 5- in (2.4) and 
(2.5). 

Remark: The condition (3.3) is effectively a lower bound on r/. The control function g{z) can be estimated 

by 

{min < y/TT+Tj, max{<5+, n/^JTi, k,} >, \E\ < 2 and k > ?y, 
(3.5) 
y/n + 77, otherwise, 

up to some factor of order one. Note that the precise formula (3.2) for g(z) is not important, only its 
asymptotic behaviour for small «, 77 and S + is relevant. The theorem remains valid if g{z) is replaced by 
g{z) with g(z) < Cg(z). In particular, g(z) can be chosen to be order one when E is not near the edges of 
the spectrum. If we are only concerned with the case of generalized Wigner matrices, (2.6), we can choose 
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g(z) = 0(^/k + 77) for any z = E + in (77 > 0). Note that Theorem 2.1 was obtained by replacing g(z) with 
the lower bound ft < g(z) in Theorem 3.1. 

Once the local semicircle law is established on scale i) x 1/M (modulo logarithmic factors), we obtain 
the following supremum bound on the eigenvectors that can be interpreted as a lower bound of order 1/M 
on the localization length. The proof of this result now is simpler than in [12, 13], since we have a pointwise 
control on the diagonal elements of the Green's function. Let u a denote the normalized eigenvector of H 
belonging to the eigenvalue X a , a = 1, 2, . . . , N, i.e., Hu a = X a u a and |u„| = 1. 



Corollary 3.2 Let H be as in Theorem 3.1, for any fixed k> 0, there exists C K that 

(logW) 13+6Q 



3A„€[-2 + k,2 - k], Hu a = X a u a , \\u a \\ = 1, Hu^oo > C K 



J < cw-dogiogw (3 6 ) 



M 1 / 2 

For the case of generalized Wigner matrices, (2.6), we have the following more precise bound 

P J 3 X a , Hu a = X a u a , \\u a \\ = 1, KIU > g(logiV) 13 + 6 » 1 ^ ^-ciogiogiv^ (3 ?) 

1 7V 1 /2[||A a |-2| + 7V-i] 1/2 J 

Proof of Corollary 3.2. Let r\ = C K Al~ 1 (\ogN) 26+12a ; G K can be chosen large enough so that (3.3) is satisfied 
for all < k, making use of (3.5) . Choose {E m } as a grid of points in [— 2 + k, 2 — k] such that the distance 
between any two neighbors is of order r\. Then with (3.4), we have 

maxmaxlm \G jj (E m +irj)\> lmm sc (E m + irj) + 1 1 < CN- c{1 ° sl ° sN \ (3.8) 

j m J 

where we used g(z) < ^Jk + 77 < C from (3.5). Then, with |m sc (z)| < C (see (2.13)) and 

a 1 1 ' 

where u a — (u a (l), u a (2) . . . u a (N)), we have 

P LaxmaxV , > c] < CN~< lo ^ N l (3.10) 

{ 3 m ^ |S m -A Q | 2 +77 2 " J ~ 

By the definition of E m , for any X a E [—2 + k,2 — k], there exists m' such that \E m i — X a \ is of the order of 
77. Together with (3.10), we obtain (3.6). 

In case of the generalized Wigner matrix (2.6), we have g{z) = ^Jn + r\ and M x N. Let 77 be the solution 
to 77 = Ar- 1 (log7V) 26+12Q (K + r;)- 3 / 2 , then A^ 1 < 77 < CiV" 1 (ft + N- 1 )- 3 / 2 ^ N) 26+12a . With this choice 
of 77, (3.3) is satisfied, and max^ \Gu — m sc \ < C(logW) _2 (K + 77) 1 / 2 holds with an overwhelming probability 
by (3.4). Since |Imm sc (z)| < Cy/n + 77, so max^ ImGu < C(k + 77) 1 / 2 . By the argument above, we obtain 
that Ijualj^ < Cn(K + if) 1 / 2 on this event. This proves (3.7). rj 

To prove that Ga{z) is very close to m sc (z) in the sense of (3.4), we will also need to control the off- 
diagonal elements. In fact we will show that all Gy (i 7^ j) are bounded by O^Mtf)^ 1 ) up to some factor 
(logN) c . To state the result precisely, we first define some events in the probability space. 
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Recall that X a , a = 1, 2, . . . , N, denote the eigenvalues of H = (hij). Denote by f2° the subset of the 
probability space such that 

max|AJ<3. (3.11) 

a 

Let fi^ (here the superscript d means diagonal) be the subset of f2° where the following inequality on the 
diagonal terms hold for any 1 < i < N 

\G vl {z) - m sc (z)\ < (log/V) 11+to (3.12) 

s/Mr) g{z) 

(recall that m sc (z) was defined in (2.13)). Similarly, let (here the superscript o means off-diagonal) be 
the subset of f2° where the following inequality on the off-diagonal terms hold for any 1 < i ^ j < N 

\G tJ (z)\ < (logiV) 5+4 " (K ^ /4 . (3.13) 



Finally, denote by VL d z the set 



ION" 



n*= n n u, k/ N^ (3-i4) 

and similarly define These sets depend on TV but we suppress this from the notations. 
Proof of Theorem 3.1. The following proposition immediately implies Theorem 3.1. 

□ 

Proposition 3.3 Suppose that the assumptions of Theorem 3.1 hold. Then, for sufficiently large N, we have 

P(QfnO°) > i-CW-dogiogiv ( 3 _ 15 ) 

for some positive constants c and C . 

Following the work of [12], we will use a continuity argument. In Section 4 we will derive a self-consistent 
equation of the form 

1 

^ + E 3 ^fG n +T l (z) 

Later we will give an explicit formula for Ti(z), but for now we take (3.16) as the definition of T^. Let 
fiJ(TV) = fij be the subset of fi° where the following inequality holds 

T = T(z) := max\r f (z)\ < (log N) 9+6a ^ +^_ V4 , (3.17) 
i V Mr/ 

We will use the following Lemmas that will be proved later in Section 4 and 5. 



Gu+ — ^ 2n , ^ =0, « = 1,2...,JV. (3.16) 
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Lemma 3.4 Let z = E + if] be a fixed complex number satisfying (3.3). Then there are constants C and c 
such that for N > No, with Nq sufficiently large independent of E and r\, the following estimates hold. 

(1) Suppose 3 < rj < 10. Then 

P(fi°) > 1 - CN- closlosN , (3.18) 

P (Q° n fij) > 1 - Cj\r cloglogiV , (3.19) 

and for any 1 < i < N , 

|El(0°)T 4 (z)| < (\ogN) w+8a ( K t y 1 ' 2 + CN- c( - loslosN l (3.20) 

Mrj 

(2) Suppose that r\ < 3. Setting z' = z + iN~° , we have 

p(n° n of, n > p(fif, n - ci\r closlogJV , (3.21) 

p (n° n of, n 0°, n nj) > p (^2 n ^% n - cn- c1os1osN , (3.22) 

and /or any 1 < i < N , 

|El(ft° n Of, n 0°,)T,(z)| < (logiV) 10 + 8 ° ^+^ 1/2 + cN-< lo ^ N \ (3.23) 

Lemma 3.5 Suppose we are on the event Oj for some fixed z = E + in satisfying (3.3). Suppose either 
3 < rj < 10 or the following inequality hold: 

max | G u -m sc {z) \ < 2(logiV)- 2 5 (z). (3.24) 

i 

Then, for sufficiently large N, we have 

max \G U - m sc (z)\ < (log ^ T(z). (3.25) 
9(z) 



Proof of Proposition 3.3. Recall that Vl d z is the subset of 0° where (3.12) holds. Since on Oj (3.12) 
follows from (3.17), the case 3 < rj = Imz < 10 follows from Lemma 3.4 and Lemma 3.5 by taking a union 
bound for < k < 107V 5 . 

Now we prove (3.15) for the case rj < 3 assuming that z = E + in satisfies (3.3). We have shown that 
(3.15) holds for r\ = 3, now we will successively decrease r\ by N~ 5 in each step, and we continue this 
inductive procedure as long as (3.3) is still satisfied for the reduced rj. More precisely, let z' = z + iN~ 5 and 
assume that (3.15) holds for z'. Our goal is to prove that 

p(of n o°) > p(of, n n°,) - cw- cloglogA \ (3.26) 

The number of steps we will be taking is of order TV 5 . Since _/V~ cloglog N <c 7V~ 5 , this proves (3.15) provided 
that we can establish (3.26). 
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From (3.21), the difference between the probabilities of the sets fl°D^l d , nf2°, and fl d , is negligible. 
With the definition of ft d and Q° in (3.14), we have 

Of n n° d h d z n fi° n n d z , n . (3.27) 

Then, to prove (3.26), it remains to prove 

P(nf n n° n of, n > p(^° n n d z , n fi°,) - cw~ cloglogAr , (3.28) 

i.e., we need to estimate the probability of the complement of Q, d on the set 51° n fif, D £1°,. On this set, 
using (3.22), we can assume that the estimate (3.17) holds with a very high probability. We will show below 
that (3.24) holds on Vt d ,. Then (3.25) together with (3.17) imply (3.12), the defining relation of tt d . This 
will conclude (3.28) and complete the proof of Proposition 3.3. Therefore, we only have to verify (3.24). 
Now we show that (3.24) holds on il d ,. Recall z' = z + iN~ 5 and we have the trivial estimate 

\G u (z) - m sc {z)\ < \Gu(z) ~ G u (z')\ + \m sc (z) - m sc {z')\ + \G u (z') - m sc (z')\. (3.29) 

In the set Vt d , , we have 

\G u {z') - m sc (z')\ < (logiv) 11+6a < (logTV)" 2 ,^'), (3.30) 

s/Mr\ g(z') 

where in the second inequality we used (3.3). By the definition of g(z) from (3.2), we have g(z) < y/K + r\. 
Thus, if (3.3) holds, then, in particular, 

i] > CiLT 1 (log A0 26+12 "- (3-31) 

This sets a lower bound on rj. Together with \z — z'\ = 1/N , we have the trivial continuity bound 

\G u (z) - G u (z')\ + \m sc (z) - m sc (z')\ < N~ 2 , 

using \d z m sc (z)\ < \Imz\~ 2 , \d z G H (z)\ = \[(H ~ z)~ 2 } u \ < \\(H~z)- 2 \\ < |Im z\~ 2 and rj > N^ 1 from (3.31). 
Thus 

\G u (z) - m sc (z)\ < N- 2 + (logNy 2 g{z'). (3.32) 
Using \g(z) \ > Cf] > CN^ 1 and |<?'(z)| < Cr/^ 1 < CN for r\ < 3, we have the following estimate 

\G«{z) ~ m sc (z)\ < 2(\ogNy 2 g(z) (3.33) 

in the set Sl d , . Thus the assumption (3.24) holds in the set fif; . rj 

Under the assumptions of Theorem 3.1, with (3.15), (3.20), (3.23) and the definitions in (3.14), all these 
fi's are sets of almost full probability, i.e., 

P(f2°), P(Qf), P(f2°), P(fif), P(Oj) > 1 - CN closlosN (3.34) 

for some c, C > 0. 
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4 Self-consistent equation for Green's function 



First, we introduce some notations. 

Definition 4.1 For any collection of s different numbers, ki,k,2,...k s € {1, 2, . . . , N}, let ij( fe i. fc 2, •■•,*:») 
denote the N — s by N — s submatrix of H after removing the ki-th (1 < i < s) rows and columns. Sometimes 
we use the notation where T denote the unordered set {k\, fo, . . . k s }. Similarly, we define sS 1 ' T ) to 

be the t-th column of H with ki-th (1 < i < s) elements removed. Sometimes, we just use the short notation 

For T = {fci, ^2, . . • fc s }, we define 

[H^-z]-\i,j), 

These quantities depend on z, but we mostly neglect this dependence in the notation. 



G y — 
ZW := 



K™ := 



We start the proof with deriving some identities between the matrix elements of G = (i? — z) 1 and 
using the following well known result in linear algebra that we quote without proof. 



Lemma 4.1 Let A , B , C be n x n, m x n and to x to matrices. We define (to + n) x (to + n) matrix D as 

'A B* 
B C 



D=(t B n) 



and n x n matrix D as 
Then for any 1 <n, we have 



D = A — B*C~ 1 B. (4.2) 

for the corresponding matrix elements. 

Furthermore, let T denote the unordered set {fci, fc 2 , ■ • ■ k s } and 1 < ki < n, 1 < i < s. We define L> (T) 
to be the n + m — s by n + to — s submatrix of D after removing the ki-th (1 < i < s) rows and columns and 
define 2)( T ) to be the n — s by n — s submatrix of D after removing the ki-th (1 < i < s) rows and columns. 
Then for any 1 <i,j <n and i,j T, we /lave 

((i) (T) )- 1 ) i . = (P (T) )-% 

/or i/ie corresponding matrix elements. 



Using Lemma 4.1 and Definition 4.1, for 1 < i ^ j < N, we have 



Gu = (KJ ) 1 = —T^—T^—T^TT- ( 4 - 3 ) 
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For the off diagonal matrix elements Gy ■, (i ^ j), we have 

r - ii - r ij — r r^K^ (aa\ 

33 U ij 3'i 33 

Similarly, we have the following result 

Lemma 4.2 Let T be an unordered set {k\, k^, k s } with 1 < k t < N for (1 < t < s) or T = 0. 
For simplicity, we use the notation (i T) for {i} U T and (ij T) for {i,j} U T. Then we have the following 
identities: 

1. For any i <£ T 

GF^rV. (4-5) 



2. For i ^ j and i,j £ T 



G\V = -gVgTk% T) = -G^GfVK^ T >. (4.6) 



13 33 « »J t« «J 

5. For i 7^ j and i,j £T 

For any indices i, j and k that are different and i,j, k £ T 



Gr-G« T) = G?GSr (G?)- 1 . (4.7) 



Proof of Lemma 4-2- The first two identities (4.5) and (4.6) are obvious extensions of (4.3) and (4.4). To 
prove (4.7), without loss of generality, we may assume that i = 1, j = 2 and T = 0. Let D = H — z and D 
defined as in (4.2) with n = 2 and m = N — 2. With Lemma 4.1, we have that for i, j = 1 or 2. 

Ga = (D-% (4.9) 

and 

G^ = [(D^)- 1 )..- (4-10) 

Since F> is just a 2 x 2 matrix, one can easily check that (4.7) holds. With the same method, one can obtain 
(4.8) . n 

Lemma 4.3 The diagonal matrix elements of the resolvent satisfy the following self- consistent equation, 
where Tj(z) is given by 

r f (z) := ■ Y^aj/;,/;^;,; 1 + (k® -e^k^ ) , (4.12) 
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and E a i is the expectation over a 4 . Let T denote the set {ki, &2, ■ ■ ■ k m }, which also could be the empty set, 
then 



IJ^-IW^I^^ (4.13) 



and for i =/= j 



\K^ T) \ < (lo g 7V) 4+4 ^M-i + (A/7,)-i max{ JmGjp}, i + j, (4.14) 
hold with a probability larger than 1 — CN~ c ( logi ° s N ^ for sufficiently large N. 



Proof of Lemma 4-3. We can write Gn as follows, 



Gu = (KilY 1 = m m m . (4.15) 



■z — X^i a ififji an( ^ tnus 



VVU UULilHl lCi a l XI. -q 



Using the fact C?W — (i^ 1 ) — 1 is independent of a 1 and E a ia 1 (z)a 1 (_7') = 5*jOi 7 -, we obtain EaiJ^ 1 - 



Combining this identity with (4.7), we have 

Gn = (-z - J2 °h ( G n - G l3 G 3l G^) + (k$ E al A«) j ■ (4-17) 

Clearly Gn can be replaced with any Gu and this proves (4.11) with the definition (4.12). 
Now we prove (4.13) and (4.14). Define 

Vij = hij/aij, (4-18) 

hence Eu^ = and E|ujj| 2 = 1. If <Xy = 0, i.e., hij = almost surely, then we set = 0. By the definition 
of -fQ* , we write 

K^^hu-z- J2 <GkP4 = hu-z- Yl ^ikG^o-uvu (4.19) 

and 

E al Ai T) = -z- Y, ^kG^au. (4.20) 

We note ha, Vij and Gjy are independent for k,l ^ («T). With the sub-exponential decay (2.14) and 
afj < l/M, we have for any i,j 

v{\hij\ < {log Nf+^M' 1 ^ > 1 - CN- c( - loslosN \ (4.21) 
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In Corollary B.3 of Appendix B we will prove a general large deviations result. Applying (B.15) to the last 
term in (4.19), with the choice 

B k i = (T ik G^ ) a li (4.22) 
and with afj = 1 and of < 1/M, we obtain that 



E VikPikG 



kl a li V H 



fcg(iT) 



(iT] 



Cfc) 



< (logiV) 



3+2q 



|G 



(iT) 
fefe 



|G 



(iT) 



(4.23) 



holds with a probability larger than 1 - CN'°^ lo&N \ Together with (4.21), we obtain that (4.13) holds 
with a probability larger than 1 - CW~ c(loglog N) for sufficiently large N. 
Next we prove (4.14). By the definition of , i ^ j, we can write 

4 JT) =^- E W^' T V y% . (4.24) 

k.l^(ijT) 

Applying (B.16), (4.21) and of, < 1/M, we obtain that 



\K^ T) \ < (logA0 4+4Q 



\ 



(4.25) 



holds with a probability larger than 1 — CN c ( 1 °s 1 °s Ar ) for sufficiently large TV. With Schwarz's inequality, 
for any 

1/2 / \ 1/2 

(4.26) 



E L r( yT L 



<fe^i 4 i G ir T) i 2 ) 7e 



Denote Ua and Aa T ^ (a = 1, 2, . . . , N — |T| — 2) the (^-normalized eigenvectors and eigenvalues of 
H^ T \ Let u-^ T \l) denote the Z-th coordinate of u„ , then for any Z 



El^ T) | 2 = (|G^ T )| 2 ) H = E^ 



T) / |2 



(01 s 



Im G^ T \z) 



/ I -i \ t j T) 1 2 

a |A Q — Z| 



(4.27) 



Here we defined \A\ 2 := A* A for any matrix A. Inserting (4.27) into (4.25) and using the definition of M in 
(2.3), we obtain that (4.14) holds with a probability larger than 1 - CW^ c ( loglog ^ for sufficiently large N. 

□ 

Proof of Lemma 3.4- We first prove (3.18) in the range 3 < r\ < 10. Recall that il° is the subset of the 
entire probability space where ||-ff|| < 3 see (3.11). By (7.11) from Lemma 7.2 (using that M > (logW) 9 is 
implied by (3.3)), we have P(fi°) > 1 — N~ clogl ° s N . Denote X a and u a the eigenvalues and eigenvectors of 
H = (hij). From the identity 

\u a (i)\ 2 



\ n Z 



(4.28) 
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and max a |A a | < 3, we have that 



V' 1 > \Gu\ > \lmG u \ > 



n 



(\E\+3) 2 +rf 

holds in £1°. Together with 3 < r/ < 10 and \E\ < 10, we obtain 

c<\G u \ < C 



(4.29) 



(4.30) 



with some positive constants. From the interlacing property of the eigenvalues of the matrix and its subma- 
trices, we find that not only < 3 but also ||i?^ T - ) || < 3 holds on the set Thus for any j, k such that 
i, j and k are all different, the bounds 



C <|G„|, \G$\, \G™\<C. 
hold in 0° by a similar argument that led to (4.30). Thus (4.6) implies 

\<C*1{Q?)\K% ) \ 

and (3.18) follows make use of (4.14) and r] > 3. 



(4.31) 



(4.32) 



Now we prove (3.19). Recall that the self consistent equation (3.16) with the error term T^(z) is given 
by (4.12), i.e., 

T 4 (z) = alG u + ]T «;,(;,,(; j;,, 1 + (Kf - ErfJCf) . (4.33) 



Now we bound T^(z) in Since a\ < M _1 , with (4.31), the first term of the r.h.s. of (4.33) is less 
than 0(AI~ l ). Then with (2.1), and using the bound on Gij (i ^ j) from (3.13) and the one on Gu from 
(4.31), we obtain that the second term of the r.h.s. of (4.33) is less than C(log N) 10+8a (Mr])~ 1 (and with 
(3.31), we know it is much less than 1), i.e., in 0° 



+ 22 ',,(<' ,,(>',, : 



< C (log N) w+ * a {Mr})- 1 . 



(4.34) 



The last term of the r.h.s. of (4.33) can be bounded, using (4.13) with T = 0, with a very large probability. 



Using (4.8) and (4.31), the G^/'s in (4.13) can be bounded as 

< \G kl \+C\G ki \\G u \. 



(4.35) 



Therefore, again with the bound on Gij (i ^ j) in (3.13) and the one on Gu from (4.31), we see that 



\K 



(0 



a ,^ ) |<2(logiV) 8 + 6 «(Afr,)- 1 /2 



(4.36) 



holds in 0° with a probability larger than P(0°) - (7iV~ c ( loslog N) for sufficiently large N. Inserting (4.34) 
and (4.36) into (4.33) and together -q > 3, we have proved (3.19). 
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Now we prove (3.20) for 77 > 3. By the definition of T; in (4.33), we have 



E 



i(n;)Ti(z) <Ei(n°) 



El 



(TO (a: 



w - E i 



(4.37) 



since E (if^ - E^K^ = 0. Using (4.31), in fl° we have that \a 2 li G li + J2 j7 n rrr/.', ,C'„f 1 is always less 
than a constant C for some C > 0. Inserting this and (4.34) into (4.37), we obtain that 



E 



l(fi;)Ti(*) < <7(log^) 10 + 8 «(M 77 )- 1 



E 



1 CN~ cio& lo s N 



We now claim that for some large enough C > there exists c > such that 



Z«|>iV c; )< e 



-N c 



and 



The first estimate follows from the definition of given in Definition 4.1 by using the sub-exponential 
decay of the matrix elements and by using the trivial bound \G$ \ < rj^ 1 ^= N. The second estimate is a 
trivial consequence of the first one and the definition of K^f . Together with (4.38), we obtain (3.20) in the 
case that 3 < 77 < 10. 

We now prove (3.21) and (3.22) for the case 77 < 3 satisfying (3.3). We will work in the event f2f, fl Cl^, 
where z' = z + iN~ 5 . Similarly as we proved (3.33), from the bound below (3.31) and the Lipschitz continuity 
of g(z) , we obtain that 

(k + t/) 1 /* 



Kj?\>N c )<e 



-N'' 



(4.38) 



(4.39) 



\G u (z)-m sc (z)\<2(\ogN) 



11+6q 



/Mrjg(z) 



and 



\G ij (z)\<2(\ogN) 



5+4 



\fMr\ 



(4.40) 



(4.41) 



hold in Q, d z , n fif, . We note the r.h.s of these inequalities are much less than (log TV) -1 by (3.3). From the 
explicit formula (2.13) we obtain that c < |m sc (2)| < C for any \z\ < 10 with some positive constants. Using 
this observation and the fact that the r.h.s. of (4.40) is much less than (log TV) -1 , we have 

c<|G«(*)|<C. 

Hence, using (3.31), (4.7), (4.8) and the lower bound of \Gu\, one can easily obtain that 

\G u (z)-m sc (z)\, \G${z)-m sc {z% \g£ \z) m sc (z)\ < C(\og 7V) 11+to (4.42) 

yMf] g(z) 

hold in n fi^,, (for the third term in l.h.s., we have also used the lower bounds of s as above). Then 
we also have 

c<\G u (z)\, \G${z)\, \G£\z)\<C (4.43) 

with some positive constants. 

The definition of m sc (z) implies Im m sc (z) < Cy/n + 77. Then with (4.42), (3.3) and g(z) < y/R + 77, we 
have that 

(4.44) 



hnG% k \z) < C^+v 
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holds in n flf, for some constant C > 0. Inserting it into (4.14), we obtain that 

\K%\z)\ < C(logA0 4 + 4 ° ( ^_g /4 (4-45) 

hold in Q,°, n n d z , with a probability larger than P(fi° nfif,) - G7V- c ( loglog ^ for sufficiently large N. Again, 
with (4.6) and (4.43), we obtain (3.21) for sufficiently large N. 
Then, as we proved in (4.34) and (4.36), we get that 

|^G Il (z) + ^4G u -G JI G^ 1 (z)|<C(log7V) 10+8 «(Af ?? )- 1 (4.46) 



\K\ 3 hz) - E aJ K\l ] {z)\ < G(logiV) 8+6Q (M? ? )- 1 / 2 (4.47) 



and 

If, . 

33 K > a 33 

hold in 17° n 0°, n Of, with a probability larger than P(ft° n nfi*') - GN- c ( log log w ) , which implies (3.22). 
Finally, similarly as using (4.37)- (4.38) to prove (3.20), we can obtain (3.23) in the case that -q < 3. rj 



5 Stability of the self-consistent equation: proof of Lemma 3.5 

In this section, we prove Lemma 3.5, i.e., we will prove the stability of the self-consistent equation with a 
precise error estimate given in (3.25). We set m sc = m sc (z) and T = max; |Yj(z)| for simplicity of notation 
and we will omit all z dependences in all the symbols. With the definition of m sc (z) in (2.12) and (2.13), 
the following properties of m sc (z) can be easily established: 

Lemma 5.1 Let z = E + irj with r\ > and \z\ < 20. Then we have 

\z + m sc \- 2 = \m sc \ 2 < 1 (5.1) 

and 

|(z + m sc (z))" 2 - l| > C^/TTTT] (5.2) 
for some constant C . Furthermore, suppose that either 2 < \E\ < 10 or K < r\. Then 

\z + m sc \- 2 = \m sc \ 2 <l-C^/JTTv. (5.3) 

For small values of \z 2 — 4| X n + i], m sc (z) has the asymptotic expansion 

m sc = T l + ly/ z 2 - 4 + 0{\z 2 - 4|), near zxi2. (5.4) 

□ 

We first prove (3.25) for the case that 3 < ?/ < 10. In this case, we can easily check that g(z) = s/k + rj. 
Denote the difference between Ga and m sc by 

Vi = Gu - m sc , 1 < % < N. 
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By the self consistent equation (3.16), (2.1) and (2.12), we have 

«< = 7— IvH 7WT \ ' ±<i<N. (5.5) 

(z + m sc + L,j &ijVj +Ti)(z + m sc ) 

For T] > 3, |z + m sc (z)| > 2 by (2.13). Using |Gjj| < 7/ _1 and |m sc | < 77 -1 , we obtain 

< 2/77 < 2/3, l<i<7V. (5.6) 

From the assumption (3.17) and (3.3), we have T = max^ |T,-| <C 1 in this region. Together with |z+?n sc (z)| > 
2 and (5.6), we obtain that the absolute value of the r.h.s. of (5.5) is less than 

1 s -7p i-r + 0(T). (5.7) 

Taking the absolute value of (5.5) and maximizing over n, we have 

sup K| < , - SU f btl , , + 0(T). (5.8) 
„ \z + m sc \ - supi \Vi\ 

The denominator satisfies \z + m sc (z)\ — sup^ \vi\ > 2 — 2/3 = 4/3, therefore we obtain sup = sup, \Gu — 
m sc (z)\ < 0(T), which shows (3.25) for 3 < rj < 10. 

Next, we prove (3.25) in the case that r\ < 3 with r\ satisfying (3.3) and under the condition (3.24). Define 

m = m(z) := — ^ G vi and u { := G u - m. (5.9) 

i 

Combining (3.17), (3.3), (3.24) with the fact that g(z) < C, we can see that 

T < (log N)- 4 g 2 (z)<C (log N)-\ \Gu — ni sc (z)\ < C(logN)~ 2 . (5.10) 
Together with (5.1), we have 

\z + m sc (z)\ - \Gu - m sc (z)\ — |T| > C 
for some C > 0. Furthermore (3.24) implies 

\m(z) - m sc \ < 2(logN)- 2 g(z) (5.11) 

thus there exists C > such that 

\z + m(z)\ - \G U - m(z)\ - |T| > C. (5.12) 
Therefore, expanding the self consistent equation (3.16) around z + m(z), we obtain that 

= Gu+ z + E J jGn+Tt = G U + + a t (5.13) 

where £li is defined by the second equality and it satisfies 

Qi = - (z + m(Z 2 + ° (l|U|lL) + ° (T) (5 - 14) 
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with error bounds uniform in i. Here |[u||oo = maxj |itj|. Taking the average of the r.h.s of (5.13) with 
respect to i, we obtain that 

m(z) + = _Q (5.15) 

z + m(z) 

where 

and it satisfies 

|fi| <0(||u||L) + 0(T). (5.17) 

Here we used ^ = u j = 0- The bound (3.24) , (5.11) and g{z) < y/tT+rj (from (3.2)) implies 

that 

Moo < 4(logiV)- 2 . g (z) < C(log7V)" 2 V^+^. (5.18) 
Together with (5.10) and (5.17), we obtain 

|fi| < C(log7V)- 4 ( K + 7?). 

To bound m(z), we use the following lemma. 

Lemma 5.2 Let z = E + ir/ G C, \z\ < 10 and let S > be a sufficiently small constant. Let t £ C such that 

\t\<5(K + rj). (5.19) 
Suppose there is a function s z (t) € C that solves the equation 

««(*)+ , * 77^ = *, (5-20) 

wif/i Ims z {t) > and i/ie estimate 



\s z (t) - m sc (z)\ < Sy/TiTrj (5.21) 

holds. Then 

\s z (t)-m sc (z)\<C-M= (5.22) 
VK + n 

for some constant C > 0. 

Proof of Lemma 5.2. It follows from (5.20) that 

-z - t JTz + t) 2 - 4 
«,(*)=*+— g—± — 2 ; ■ (5.23) 

We denote by s*(i) and s 2 (i) the two solutions of this equation, which are continuous with respect to t 
locally in the neighborhood (5.19). When t = 0, one of them is equal to m sc (z), we choose s*(0) = m sc {z). 
From (5.23), we have 

Is 1 -* 2 ! = |(z + t) 2 -4| 1 / 2 . (5.24) 
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Then, for small enough 8, if |t| < 5(n + rj), then Is 1 — s 2 | > \ min{|z — 2|, \z + 2|} by using (5.24) and that 
k + 77 x min{|z — 2|, |z + 2|}. We thus see that only one out of s 1 and s 2 can satisfy (5.21). With the 
assumption that sj(0) = m sc (z), it is s 1 that satisfies (5.21). Then 

««(*)- m, c («) = *i(*)-«i(0) (5.25) 

and (5.22) follows from the fact that 

< O I 1 ) < O ( - 1 1 , (5.26) 

where for the second inequality, we used \t\ < 5(k + rj). rj 
Using Lemma 5.2, for s z (t) = m(z) and t = — fi, we have 

|m(z) - m sc (z)\ < -ffi= < C(log AO^IIujU + (5.27) 

where in the second inequality we used (5.17) and (5.18). Subtracting (5.17) from (5.13), we have the 
equation for m 

Ui = Gu - m( Z ) = + n + 0(HU + 0(T) = to, + (z ^ ))a , (5-28) 

where w, is defined as Ui — (J^ ■ a 2 ^Uj){z + m sc )~ 2 . By (5.17), it is bounded by 

IjwIU = 0(||u||^) + O (HullooKz + m)- 2 - (z + m sc y 2 \) + 0(T). (5.29) 
Then, using (5.11) and (5.1), we obtain that 

\(z + m)- 2 -{z + m sc y 2 \ < C\m(z) - m sc (z)\. (5.30) 
Inserting this into (5.29), using the bounds on ||u||oo in (5.18) and (5.27), we have 

||w|| oo =0(||u|| 2 0O ) + 0(T). (5.31) 
From (5.3) in Lemma 5.1, whenever \E\ > 2 or k < 77, in which case g(z) X ^Jk + rj, we have 

\z + m sc \- 2 < 1-Cy/K + rj, (5.32) 
for some C > 0. Therefore (5.28) imply in this region that 

||u||oo < C(« + ^)" 1/2 || w||oo. (5.33) 

Using (5.31), we get 

Hulloo < -^=\\u\\l + —£i=Y, (5.34) 
and using HuH^ <C \/~K + rj from (5.18), we conclude that 

||u|| 00 <0(- ? ^=). (5.35) 
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Combining this with the bound on m — m sc (5.27) and Gi 



J7i, r = m — to, 



Uj, we obtain (3.25). 



Finally, we consider the main interesting regime: \E\ < 2 and K > r\. We claim that the following 
inequality about m sc (z) holds. 

Lemma 5.3 Let 1 > 8- > be a given constant. Then there exist small real numbers r > and C\ > 0, 
depending only on J_, such that we have 



max 

xe[-l+5-,l-8+] 



T + X TO, 



< {\- Cl g{z)){l + rf 



with 



g{z) = max{<5+, |1 - Rem 2 c (z)|} 
/or any positive number 8+ such that — 1 + 6— < 1 — S + . 



(5.36) 
(5.37) 



We postpone the proof of this lemma to the end of this subsection and we first complete the main 
argument. Recall that B = {(t|-}^ =1 is the matrix of variances which is symmetric. We also recall 8± from 
(2.4) and we will apply Lemma 5.3 with these 8- and <5+. Fix z, set ( := m 2 c (z) — (m sc (z) + z)~ 2 and 
rewrite (5.28) as 



u= (I -OB^w 



1+T 



(B + T 



1 + T 



(5.38) 



with r given in Lemma 5.3. Define Q := I — |e)(e| to be the projection onto the orthogonal complement of 
the normalized eigenvector e = A r_1 / 2 (1, 1, . . . , 1) belonging to the simple eigenvalue 1 of B. Note that B 
and Q commute and that the spectrum of BQ lies in [—1 + 8-, 1 — 8 + ]. Denote by \\A\\ the usual I 2 -+ £ 2 
norm of a matrix A. Since 



(B + T 



1+T 



Q 



< sup 

KG[-l+i5_,l-5 + ] 



(X + T 



1 + T 



< (1 - d ?(z)) 1/2 < 1 



by the Lemma 5.3 and w _L (1, 1), the Neumann expansion of (5.38) converges on span ((1, I))- 1 and 



-e-cio-w-^E^) 



We will compute the £ c 

(B + T 



n=0 



norm of this matrix. First note that 



1+T 



max 



OO— >00 



(B + T 



T /ij 



< 



1 



1+T 



max 



t4vI<M±I<1, 



|2 s 1 Qrl J IR..I _ V R . . — ^-2 



1+T 



(5.39) 



(5.40) 



since \(\ = |to sc | 2 < 1 and = £^ . = ■ = 1. Then we have 



(t^)"» 



CB + t y 

1 + T 



Qu 



< sup 

ze[-l+<5-,l-<5+ 



(x + T 



1 + T 



||u||<(l- Cl ?W)" /2 ||u! 



by Lemma 5.3. Since for any N x N matrix we have 
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we obtain 



(B + t 

1 + T 



Q 



< VN(l- Cl g(z)) n / 2 . 



(5.41) 



Thus, estimating the first n < no := (log N) (cig(z)) 1 terms in (5.39) by (5.40), and the rest by (5.41), we 
get 



u| Lo < 



log TV 



loe-N, 



J2 ^(l-d^r^jiiwiu^c^nw 



Using the bound (5.31) on HwHoo and the bound (5.18) on llujjoo, we have 



which implies 



lulU^OogJVr^lulloo + C^T 



log N 



for some C > 0. Combining this with (5.27), we find 

\Gu - m sc \ < \m(z) - m sc (z)\ + \ui\ < C 
which implies (3.25), since g(z) = min{\/K + rj, g(z)}. 



< C 



logiV 1 
— - 1 

g( Z ) y/K + T] 



(5.42) 



(5.43) 



□ 



Proof of Lemma 5.3. First, if g(z) = <5+, then we choose r = 0. With |m sc | < 1 in (5.1), one can see that 
(5.36) holds. 

In the case oig(z) = |1 — Rem^ c |, we have ~g < 2 by using |m sc | < 1. We choose r = S-/10, then 



max 



r + im, 



< max 



6_ 
10 



10 v ' sc 



and 



6_ 

To 



- (1 - S-)m 2 s 



< 



1 - 



9<5_ 
U 



<(l-T5(z))(l + <5-/10) 2 . 



For the other term in r.h.s. of (5.44), we have 

2 



6- 
10 



|m sc | 4 + 0.2£_Re (m 2 sc ) + (5-/10) 2 



With |m sc | < 1 in (5.1) and g(z) = |1 — Re(m 2 c )\ in this case, (5.46) is bounded as 
6- 



10 



- f 1 + To ) ~ °- 2(5 -^) - f 1 + To ) ( J " ^M) 



for some C depending on <$_. At last, we complete the proof by combining (5.47) and (5.45). 



(5.44) 
(5.45) 

(5.46) 

(5.47) 
□ 
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6 Proof of the universality of local statistics 



We now outline the main steps to prove Theorem 2.2. 

Step 1. Local relaxation flow. Following [19], we first prove that the local eigenvalue statistics of Dyson 
Brownian motion (DBM) at a fixed time t are the same as those of GUE if t x N~ £o for some £o- The DBM 
is generated by the flow 

H t =e- t l 2 H + {l-e~ t ) 1 ' 2 V 1 (6.1) 

where Hq is the initial matrix and V is an independent GUE matrix whose matrix elements are centered 
Gaussian random variables with variance 1/N. Strictly speaking, for each matrix element we have used 
the Ornstcin-Uhlcnbcck (OU) process on C instead of the Brownian motion which was used in the original 
definition of DBM in [19]. It is easy to check that the eigenvalues of H t follow a process, very similar to 
the original DBM in [19], but with a drift. With a slight abuse of terminology, we will still call this process 
DBM. More precisely, let 



/! = /iAr(dx) = dx, J{(x) = N 



N „2 



1=1 l<j 



(6.2) 



(f3 = 2 for GUE) be the probability measure of the eigenvalues of the general /3 ensemble, j3 > 1 (in this 
section, we often use the notation Xj for the eigenvalues to follow the notations of [19]). In this paper we 
consider the (3 = 2 case for simplicity, but we stress that our proof applies to the case of symmetric matrices 
as well. Denote the distribution of the eigenvalues at time t by /t(x)/i(dx). Then f t satisfies 



where (see (2.2) in [19] 



d t ft = Sen- (6-3) 



(6.4) 




Theorem 6.1 Suppose that the probability law for the initial matrix Hq satisfies the assumptions of Theorem 
2.2. Then there exists Eq > such that for any 

t>N- £o , (6.5) 

the probability law for the eigenvalues of H t satisfies (2.20), i.e., for any k > 1 and for any compactly 
supported continuous test function O : K fc — > M, we have 

lim lim — / dE' / dai . . . dak 0(ai, . . . , a*,) 
«->o jv^oo 2k J e _ k J M k 

1 ( (fc) (fe) \/V, "1 rpl , a k \ _ n 

( k) 

where p G lj E N is the k-point correlation function of the GUE ensemble. 
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Proof of Theorem 6.1. We first recall the following general theorem concerning the Dyson Brownian motion 
from [19] that asserts that under four general assumptions, the local eigenvalue statistics of the time evolved 
matrix H t coincide with GUE. The first assumption (called Assumption I in [19]) is a convexity bound on 
1L which is automatically satisfied in our case and we only have to verify the following three assumptions. 

Assumption II. There exists a continuous, compactly supported density function q{x) > 0, J„g = 1, 
on the real line, independent of N, such that for any fixed a, b £ M. 

lim sup 



f 1 N f b 



(6.6) 



For the next assumption, we introduce a notation. Let jj = 7j,jv denote the location of the j-th point 
under the limiting density, i.e., jj is defined by 

pi 

N / g(x)dx = j, 1 < j < N, 7j £ suppg. (6.7) 

J — oo 

We will call jj the classical location of the j-th. point . 
Assumption III. There exists an e > such that 

N 

SU P / m Efe ~ 7,) 2 / t (dx)/.(dx) < CN- 1 ^ (6.8) 

*>o J i 

3 = 1 

with a constant C uniformly in N. 

The final assumption is an upper bound on the local density. For any I £ R, let 

JV 



denote the number of points in /. 



Assumption IV. For any compact subinterval Iq C {E : g(E) > 0} independent of N, and for any 
S > 0, a > and r > 0, there are constants c depending on Iq, S, a and r such that for any interval I £ Iq 
with |/| > N- 1+a , we have 

sup [ l{N I >KN\I\}f T dLi<N- closlosN , K = N r (6.9) 
where e is the exponent from Assumption III. 



Theorem 6.2 [19, Theorem 2.1] Let e > be the exponent from Assumption III. Suppose that there is a 
time t < N~ 2e such that the following entropy bound holds 

SM ■= J /rl0g/ T d// < CN m (6.10) 
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for some fixed exponent m. Suppose that the Assumptions II, III and IV hold for the solution f t of the 
forward equation (6.3) for all time t > r . Let E £ K be a point where g(E) > 0. Then for any k > 1 and 
for any compactly supported continuous test function O : R fe — > R, we have 



^ pE+b p 

lim lim sup — / dE' / dai . . . dak 0{a.\, . . . , ak) 

b->0N-Hx) t>N -2e+s 2bJ E _ b J R k 



1 (Jk) (fc) \( F , , «i p / , afc \ 



(6.11) 



Theorem 6.2 was exactly Theorem 2.1 of [19] except that the assumption (6.10) on the entropy in [19] was 
stated for the initial probability density fo- Clearly, we can start the flow (6.4) from a fixed time t -c N~ 2e+S 
since the statement of Theorem 6.2 concerns only the time t > N~ 2e+S . In the case that the flow (6.4) is 
generated from the matrix evolution (6.1), the entropy assumption (6.10) is satisfied automatically. To see 
this, let v\ 3 denote the probability measure of the ij-th element of the matrix Ht, i < j, and T>t the probability 
measure of the matrix H t . Let fx denote the probability measure of the GUE and li 13 the probability measure 
of its ij-th element which is a Gaussian measure with mean zero and variance 1/N. Since the dynamics of 
matrix elements are independent (subject to the Hermitian condition), we have the identity 

The process t — > itf is an Ornstein-Uhlcnbcck process and each entropy term on the right hand side of the 
last equation is bounded by CN provided that t > 1/N and i/^ has a subexponential decay. It is easy 
to check from the explicit OU kernel. Since the entropy of the marginal distribution on the eigenvalues is 
bounded by the entropy of the total measure on the matrix, we have proved that 



/l/ivlog/l/Ard^CTV 3 , (6.13) 

and this verifies (6.10). Therefore, in order to apply Theorem 6.2, we only have to verify the Assumptions 
II, III and IV. Clearly, Assumption II follows from Theorem 2.1 (note that in the case of generalized Wigner 
matrix, M x N and g(z) x s/k + rj). Assumption IV also follows from Theorem 2.1 by noting that 
Nr < CIm(_E + iff) if / is an interval of length r\ about E. We also note that Assumption IV in [19] was 
stated in a slightly stronger form, requiring a large deviation bound (6.9) for all K > 1, but inspecting the 
proof of Theorem 2.1 of [19] reveals that Assumption IV is used only for K larger than some positive power 
of iV" and smaller than N (the main observation is that the upper limit of the summation in (7.16) of [19] is 
effectively N and not oo). 

Having verified all other assumptions, it remains to prove (6.8), which we state as the next theorem. 

Theorem 6.3 Suppose H satisfies the assumptions of Theorem 2.2, in particular, it is a generalized Wigner 
matrix with positive constants Ci n f, C sup in (2.6). Let Vij(x)dx := aijUij(aijX)dx be the reseating of the 
distributions Vij of the matrix elements and suppose that they satisfy the logarithmic Sobolev inequality (LSI) 
with a constant Cs independent of N,i,j, i.e., 

J ulogud% <C S J |VVu| 2 dzAj (6.14) 
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holds for any smooth probability density u, J uduij = 1. Denote Aj the i-th eigenvalue of H in increasing 
order, Ai < A2 < . . . < Xn- Then there exists e > depending on a, (3 in (2.14) but independent of Ci n f, 
C sup and Cs such that 

1 N 

-^E(A l - 7l ) 2 <C7V- 1 - 2e , (6.15) 

i=l 

if N is sufficiently large (depending on Ci n f, C sup , Cs, ct and (3). 

The proof of Theorem 6.3 will be given in Section 7. It is easy to check that if an initial matrix H = Hq 
satisfies the conditions of Theorem 6.3, then its evolution H t under the Ornstein-Uhlenbeck flow will also 
satisfy these conditions with constants changed at most by a factor two. The main condition to check is 
that the logarithmic Sobolev inequality (6.14) holds for < t < 1. But this was proved in the argument 
following Lemma 5.3 of [15] using an estimate on the logarithmic Sobolev constant for convolution of two 
measures, i.e., Lemma B.l of [15]. Therefore Theorem 6.3 guarantees (6.15) for all positive times t > 
and this proves Assumption III provided that the initial distribution satisfies the LSI (6.14). We have thus 
proved Theorem 2.2 for matrix ensembles of the form 

h tJ = e~ l l 2 h l3 + (1 - e-^N- 1 ' 2 ^, t > N~ 2 " +s (6.16) 

where £y are i.i.d. complex random variables with Gaussian distribution with mean and variance 1, 

and hij's are independent random variables such that the rescaled variables Qj = hij/o~ij satisfy th LSI 
assumption (6.14). In (6.16) S > is arbitrary and e is fixed in Theorem 6.3. In particular, with the choice 
5 = e and t x N~ E , we have proved Theorem 2.2 for matrix ensembles h%j = o~ijQj if is of the form 

Cy = (1 - 7) 1/2 Cy + l 1,2 £,? v 1 ~ N ~ £ , distribution of satisfies (6.14). (6.17) 

Step 2. Eigenvalue correlation function comparison theorem. 

The next step is to prove that the correlation functions of eigenvalues for two matrix ensembles are 
identical up to scale 1 /N provided that the first four moments of all matrix elements of these two ensembles 
are almost identical. This theorem is a corollary of Theorem 2.3 and we state it as the following correlation 
function comparison theorem. The proof will be given in Section 8. Note that the assumption (2.21) in 
Theorem 2.3 is satisfied by Theorem 3.1; in case of generalized Wigner matrix we have g(z) = ^/k + 77 and 
M x TV, so in the regime where \E\ is separated away from 2, we have from (3.4), that Gu(z) is uniformly 
bounded (modulo logarithmic factors). 

(k) (k) 

Theorem 6.4 Suppose the assumptions of Theorem 2.3 hold. Let p v N and p w N be the k— point functions 
of the eigenvalues w.r.t. the probability law of the matrix H^-""' and H^ w ' , respectively. Then for any \E\ < 2, 
any k > 1 and any compactly supported continuous test function O : R fc — > K we have 

dai . . . da k 0(ai, . . . , a k ) (p% - p%) (e + ^, ...,E+ ^) = 0. (6.18) 

Step 3. Approximation of a measure by Ornstein-Uhlenbeck process for small time. 

Summarizing, we have proved Theorem 2.2 in Step 1 for matrix ensembles whose probability distributions 
of the normalized matrix elements Qj are of the form (6.17). Using the Green's function comparison theorem, 
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i.e. Theorem 6.4, we extended the class of distributions to all random variables whose first four moments can 
almost be matched (more precisely, match the first three moments and almost match the fourth moments 
in the sense of (2.22)) by random variables in the class (6.17). In order to complete the proof of Theorem 
2.2, it remains to prove that for all measures in the class given by the assumptions of Theorem 2.2, i.e., 
measures satisfying the subexponcntial decay condition, the uniformly bounded-variance condition (2.6) and 
the moment restriction (2.19) for the real and imaginary parts, we can find random variables in the class 
(6.17) to almost match the first four moments. Since the real and imaginary parts are i.i.d., it is sufficient 
to match them individually, i.e., we can work with real random variables normalized to variance one. This 
is the content of the following Lemma 6.5. Notice that the uniformity in the conditions (2.19) and (2.14) 
guarantees that the bounds (6.19) hold with uniform constants C\,C<i- This implies the uniformity of the 
LSI constants, needed in Theorem 6.3, for the random variables constructed in Lemma 6.5. The proof of 
this Lemma will be given in Appendix C. We have thus proved Theorem 2.2. rj 

Lemma 6.5 Let and be two real numbers such that 

m 4 - mj - 1 > Ci, m 4 <C 2 (6.19) 

for some positive constants C\ and C2. Then for any sufficient small 7 > (depending on C\ and C2), there 
exists a real random variable £ 7 whose distribution satisfies LSI and the first 4 moments of 

£' = (l-7) 1/2 £ 7 +7 1/2 £ G (6.20) 

are ; 1, m^^') = 7713 and m^'), and 

|m 4 (£') - m 4 | < C7 (6.21) 

for some C depending on C\ and C2, where £ G is real Gaussian random variable with mean and variance 
1, independent o/£ 7 . The LSI constant of £ 7 (and thus £') is bounded from above by a function of C\ and 

c 2 . 



7 Proof of Theorem 6.3 

Theorem 6.3 states that the eigenvalues are at a distance N^ 1 / 2 ^ 6 from their classical locations in a quadratic 
average sense. We will deduce this conclusion from the information on the closeness of the local density to 
the semicircle law. We note the constants appearing in this section may also depend on a and /3 in (2.14), 
but we will not mention the dependence in the proof. 

First we reformulate a result, which we have proved in [19], in a somewhat more general setup. It states 
that random points, Xj, are close to a fixed set of locations, 7,-, if the local fluctuation is controlled, if the 
averaged counting function is close to the counting function of the jj's in L -sense and if some tightness 
holds. For simplicity, the result is stated for the case when 7j's are the classical locations given by the 
semicircle law g = g sc , (6.7), but the statement (and its proof) holds for any density function with support 
being a compact interval and with square root singularity at the edges. In particular, we applied this result 
in [19] for the Marchenko-Pastur (MP) distribution instead of the semicircle law. The counting function of 
7j can be replaced by its continuous version, i.e., by the distribution function of the semicircle law which 
defined by 

n sc (E) := / g sc (x)dx. (7.1) 
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Lemma 7.1 Let Ai < A2 < . . . < Xn be an ordered collection of random points in IR. Denote the averaged 
counting function of Xj 's 

n\E) = ^E#[A, <E\. (7.2) 
Suppose the following four assumptions hold. 

1. [Tightness at the edge] There exist m < 7 and e > such that 

n x (-2- N- 1 /" 1 ) <Ce~ NS and n A (2 + N~^ m ) > 1 - Ce~ N ° (7.3) 
and for any K > 3, 

n x (K)>l-e- N " losK and n x {-K ) < e'*" logK (7.4) 

2. [L -closeness of the counting functions] 

/oo 
\n x (E)-n sc (E)\dE <CN- & / 7 . (7.5) 
- 00 

3. [Fluctuation of moving averages] For any small 5 > there is a constant C such that for any j, K G N 
with j + K < N + 1, t/ie Zoca/ averages Xj t x '■= K" 1 X^n 1 satisfy 

P (|A ijif -EAj,jr| > at-V2+^-i/2^ < Ce -^ /2 . (7 . 6) 

4- [Positivity of the bulk density] There exists a small enough S > such that: for any interval I with 
\I\ = _/V~ 5 / 8 and I C [—2 + N~ s , 2 — N~ s ] , the number of the X 's in I is bounded from below as follows 

P (#{Aj el}> N- S N\I\) > 1 - CN closlosN . (7.7) 
Then there exists e > (independent of the constants in these four assumptions) such that 

1 N 

-^E(A 4 - 7 ,) 2 <C7V- 1 - (7.8) 
when N is large enough (depending on the constants in these four assumptions). 

Proof of Lemma 7.1. In Theorem 9.1 of [19] wc have proved the analogous result on the singular values 
of the covariance matrix, where the role of the semicircle law was played by the MP law and the spectral 
edges, ±2, were replaced by X±, the two edges of the support of the MP distribution. In that paper we first 
proved the analogues of these four assumptions, then we presented the proof of (7.8) via a general argument 
that used only these assumptions. Inspecting the proofs of Lemma 9.5, 9.6 and 9.7 in [19], leading to (7.8), 
we observe that only equations (9.6), (9.8), (9.9) and (9.13) from [19] were used, in addition to the lower 
bound on the density of the points in the scale TV -5 / 8 , which is used below (9.51) of [19]. The lower bound 
on the density is granted by the last assumption (7.7) (even with a better control on the probability than 
we required in [19]). Repeating the argument from [19], for the proof of Lemma 7.1 it is sufficient to check 
that the first three assumptions in Lemma 7.1 imply equations (9.6), (9.8), (9.9) and (9.13) in [19]. We now 
explain how to obtain these necessary bounds from our assumptions. 
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The first condition (7.3) corresponds to the input for Lemma 9.2 in [19], in particular, the analogue of 
(9.6) of [19], 

-2 - N- 1 '" 1 < EAj < 2 + TV 1 /™, 

follows immediately from (7.3) and (7.4). We note that (9.6) in [19] contains a threshold iV -1 / 5 but actually 
in the proof we only needed it to be much less than iV -1 / 7 (see (9.36)-(9.37) of [19] for the application of 
(9.6)). 

The second condition (7.5) corresponds to Eq. (9.8) in [19]. As we showed in the proof of Lemma 9.3 of 
[19], Eq. (9.9) directly follows from (9.8). Here the analogous bound 

sup \n x {E) - n sc (E)\ < CN 3/7 

E 

follows directly from (7.5) in the same way. 

Finally, the third condition (7.6) is exactly the same as (9.13) in [19]. Simply repeating now the proof of 
Theorem 9.1 from [19], we proved Lemma 7.1. rj 

Theorem 6.3 will now follow from Lemma 7.1 if we prove that the four conditions in the Lemma 7.1 hold 
in the case of generalized Wigner matrices (2.6). The last condition (7.7) follows from the local semicircle 
law (Theorem 2.1) and from the fact that g sc (x) > c^/k for x £ (—2 + k, 2 — k). Here we list the first three 
conditions as three separate lemmas that will be proven in the next three subsections. This will complete 
the proof of Theorem 6.3. rj 

Lemma 7.2 (1) Let H be a generalized Wigner matrix with subexponential decay, in fact it is sufficient to 
assume that (2.14) and the upper bound C sup < oo in (2.6) hold. Define n x (E) as in (7.2). Then 

n A (-2 - N-V 6+£ ) < Ce- Ne ' and n x {2 + N" 1 / 6 ^) > 1 - Ce~ N ''' (7.9) 
for any small e > with an e' > depending on e. Furthermore, for K > 3, 

n x {-K) < e ~ N '^ K and n x (K) > l- e - N ' lo sK ( 710 ) 

for some e > 0. 

(2) In fact, the last tightness bound holds in a more general situation, namely, let the universal Wigner 
matrix H satisfy (2.1), (2.14) and M > (log TV) 9 wh ere M is defined in (2.3). Then we have 

n A (-3) < CW^ closlogAr an d n A (3) > 1 - CN~ cl ° sl ° sN . (7.11) 

Lemma 7.3 Let H satisfy the conditions of Theorem 6.3. Then for any e > we have 

\n x (E) ~n x c {E)\dE < CN~ 1+e . (7.12) 



Lemma 7.4 Let H satisfy the conditions in Theorem 6.3, in particular, let the distribution of the matrix 
elements satisfy the uniform LSI (6.14). For j,K £ N, j + K < N + 1, define Xj,x = ^"'Eilo 1 ^j+i- 
Then for any S > small enough, 

P (\X jiK - E(X jiK )\ > jv-Va+ijf-Va^ < Ce~ N \ (7.13) 
with C depending on C sup in (2.6) and Cs in (6.14). 
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7.1 Proof of Lemma 7.2. 

Extreme eigenvalues are typically controlled by the moment method, evaluating ETr77 fc for large k using 
some graphical representation. Our proof follows the standard path, but since we were unable to find a 
reference that would apply precisely to our case, we include the proof for completeness. The main technical 
estimate (7.18) is borrowed from [34]. We remark that if we use the strongest result in [34], one can improve 
the exponent 1/6 to 1/4 in (7.9). 

We start with the proof of (7.9) and (7.10) in the case of generalized Wigncr matrices (see (2.6)). First 
we truncate the random variables. With the assumption of subcxponcntial decay of h^, for any small S > 0, 
one can find a hij such that 

P(K tf = hij) > 1 - e- N °' (7.14) 

and 

\hij\ < N- 1 / 2 ^, E(fcy) = 0, E(fft«| 2 ) < E(\hij\ 2 ) (7.15) 

for some small number e', depending on S. Then we only need to bound the spectral norm of the new matrix 
H = (hij). To prove (7.9), it only remains to prove that, for some small e' > 0, 

P(||#|| > 2 + iV- 1 / 6+e ) < e~ N '' . (7.16) 

With 

P(||£|| >2 + tv- 1 / 6+£ ) < 



(2 + iV- 1 /6+e)fc' 

for any even k, (7.16) follows from 

ETrH k " < 2 k0+o< > lo s N \ (7.17) 

with the choice of fco = TV 1 / 6-5 / 3 and S = 3e/2, since \\H\\ k < Tr H k for even powers. The proof of (7.10) is 
analogous. 

To estimate E(Tr H k ) for k £ N, we start with introducing some notations and concepts on graphs. 

Let p and k be given integers. We define the concept of ordered closed walk of k edges on an abstract 
ordered set A p := {ai, 02, ■ • ■ , a p } of p elements with the natural ordering a\ < < . ■ . < a p . An ordered 
closed walk on p vertices with k edge is determined by a sequence w = (u>i, W2, ■ ■ ■ , Wk) of the elements of 
A p with the following properties: 

i) Along the walk, the fresh vertices from A p are adjoined in increasing order, i.e., max 3 < m uij < 
maxj< m _i Wj + 1. 

ii) {u>i, w 2 , ■ ■ ■ w^} = A p , i.e., all points of A p are visited. 

iii) Let T(w) denote the undirected graph associated with w, i.e., the vertex set of T(w) is A p , the edges 
are given by (wi, w?), (1V2, W3), . . . (wk, wi); with multiple edges as well as self-loops (wi = lUj+i for 
some i) allowed. Then every edge of T appears at least twice. 

Let W(fc, p) denote the set of ordered closed walks on p vertices with k edges. Their number was estimated 
in Lemma 2.1 of [34] 

W(k,p) := |W(MI < [ 2p k _ ^ k - 2 ^2 2 v-\ (7.18) 



33 



This bound will be sufficient for the proof of (7.9) with exponent 1/6 + e. We remark that Lemma 4.1 of 
[34] gives a different bound on (7.18) that is better by essentially a factor [(k — 2p)/p] k ~ 2p . Applying this 
bound, one could improve the exponent in (7.9) to 1/4 + e but we will not pursue this improvement here. 

We also need the concept of labelling the elements of A p by the set {1,2,..., N}. A labelling is given by 
a function £ : A p —> {1, 2, . . . , N} and we require that £ be injective. The set of such labelling functions is 
denoted by £ (p, N) . 

With these notations, we have the formula 

N 

ETr H k = ^K %2 h l2t3 . . . h ikil 

ii,i3,... ifc=l 
fc/2+1 

= / ] E E Eh e(-wi)£(w 2 )h((w 2 )e(w 3 ) ■ ■ -hi( Wk )t( Wl ). (7.19) 

To verify this formula, for any given sequence i\, . . . , ik on the l.h.s., let p denote the number of different 
elements in this sequence and let the set A p be identified with these different elements in the order of their 
appearance (i.e. for any m we let a m := i s for some s if i s ^ it, t < s, and i s is the m-th freshest element 
among i\, i<i, . . . , i s , i.e., \{ii, 12, ■ ■ ■ , is-i}\ = m — 1). Let w\, 11)2, ■ ■ ■ ,Wk encode the sequence 11,12, ■ ■ ■ ,ik 
with the new labels 01, a<z, . . . a p . One may think of the walk, w%, W2, ■ ■ ■ , itffc, as the topological structure of 
the sequence (11,12, ■ ■ ■ ,ik) where the original labels from the set {1,2,..., N} have been replaced by abstract 
labels, defined intrinsically from the repetition structure of (ii, 12, ... , ik)- Formula (7.19) is a resummation 
of all sequences (ii,i2, ■ ■ ■ ,ik) hi terms of topological walks (first and second sum) and then reintroducing 
the original labelling with {1,2,..., N} (third sum). Since the first moment of hij vanishes and different 
matrix elements are independent, all terms on the right hand side have zero expectation in which at least 
one factor hij appears only once. This justifies the requirement iii) in the definition of the ordered closed 
walks. The restriction p < k/2 + 1 in the summation then comes from iii). This proves (7.19). 

To compute the expectation on the r.h.s. of the (7.19), we need to introduce the concept of the skeleton 
of the walk. Given w € W(k,p), its skeleton S(w) is the undirected graph on A p that is obtained from T(w) 
after replacing each multiple (parallel) edge by a single undirected edge. Here S(w) allows self-loops (as long 
as every edge has multiplicity 1). Thus the edge set E(S(w)) of the skeleton coincides with the edge set 
E(T(w)) after neglecting multiplicity and direction. The skeleton is a subgraph of the complete graph on 
A p . We will also define the tree of the walk, T(w), which is just a spanning tree of the skeleton S(w) built 
up successively along the walk by a greedy algorithm: include an edge to the T(w) if it does not create a 
loop together with the previously adjoined edges. Since T(w) is connected, and then so is S(w), thus T(w) 
is indeed a tree on p vertices, in particular the number of its edges is 

\E(T(w))\=p-l. (7.20) 

and S(w) \ T(w) has total edge multiplicity less than k — 2{p — 1). 

For any edge e g E(S(w)) of the skeleton, let vie) denote the multiplicity of e in T(w) (edges with both 
orientations are taken into account). Clearly 

E Ke) = * (7-21) 

for any skeleton graph S = S(w) for w £ W(k,p). Finally, for a given edge e = (a a , ap) in a subgraph of A p 
and for any labelling £ S C(p,N), we define the induced labelling of the edge e by 1(e) = (£(a a ),£(ap)). 
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With these notations we have 



< n E Ne)i Ke) - 

eeE(S(w)) 



Note that \hij\ = \hji\, therefore there is no ambiguity in the notation |/i^( e )|- Since \h\ < N 1 / 2+s and 
v(e) > 2, we have 

E \h £ (e)\ u(e) < N^+W^-Vaf^, (7.22) 

or, alternatively, 

E \h e{e) < jv(-V2+*M«0 . (7.23) 

We will use (7.22) for the edges of the tree, e £ E(T(w)), and we use (7.23) for the remaining edges 
e £ E(S(w)) \ E(T(w)). We can now estimate (7.19) using (7.21) and (7.20): 

fc/2+l 

\ETvH k \<j2 E E II E Ne)r (e) 

p=l weW(k,p) £eL(p,N) e£E(S(w)) 

< e 1 e iv(-v^)^-b-)) y: n °fw 

p=l s eW(/t,p) teL(p,N) e£E(T(w)) 

fc/2+1 

< jyl+(-l/2+«)(fc-2(p-l)) < ^ 7 _24) 
p=l ;ujew(fc,p) 

In the last step we used that 

e n 

holds for any tree T. This identity follows from successively summing up the labels for vertices with degree 
one in T by using the identity afj = 1. 
Using (7.18), we obtain the bound 

fe/2+l 

\ETrH k \ < J2 (7.25) 



ith 



S(k,p)~ Q ^ ^ p 2(fe-2p+2) 2 2p-2 iv l+(-l/2+«)(fc-2(p-l))_ (? _ 26) 

It is easy to show that 

5(fc,p-l)<^^5(fc,p). (7.27) 

Choosing fc = TV 1 / 6 " 15 / 3 , we have S"(fc,_p - 1) < S{k,p). Inserting this into (7.25), we obtain (7.17) and 
complete the proof. 



Now we prove (7.11) with the same method. Similarly, with the assumption on the distribution of hij, 

ij = hij) > 1 - CN~ closlosN (7.28) 



one can find a hij such that 
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and 

\hij\ < Af" 1/2 n, E(h iJ ) = 0, E(|fty| 2 )<E(|^-| 2 ) + iV- clogloEjv (7.29) 

for n = (log N) (log log AT). Here h^ can be obtained by considering the cutoff random variables hij l(\hij\ < 

.A/ -1 / 2 (log AT) (log log AT)) and then slightly modifying them to recover their zero expectation value. 

We can again bound |E Tr H k \ as in (7.25) but with a slightly different S(k,p); instead of the factor 
N i+(-i/2+s)(k-2(p-i)) we will have N . M (-i/2+5)(fe-2( P -i)) in the definition ( 7 . 2 6). These modified S(k,p) 

numbers satisfy 

S(k,p-l)<^S(k,p) (7.30) 

and 

S(k,k/2+ 1) = 2 k ■ N. (7.31) 

2,6 

Choosing k = n, we have < 1. Thus we obtain 

|E TvH k \ < 2 k ■ 2nN, (7.32) 

which implies (7.11) . rj 

7.2 Proof of Lemma 7.3. 

First we show that the estimate on the expectation of m — m sc is better than the estimate (2.16) on m — m sc 
itself. 

Lemma 7.5 Assume that the N x N generalized Wigner matrix H (see (2.6) ^ satisfies (2.1), (2.4), (2.5) 
and (2.14), Mhij = 0, for any 1 < i,j < N (i.e. the assumptions of Theorem 3.1 apart from (3.3) hold). 
Then we have, with some C > 0, 

\Em(z) ~ m sc (z)\ < , (7.33) 

for any z = E + in, 77 > 0. 

As a preparation to the proof, we need the following technical lemma that we state under more general 
conditions so that it is applicable for universal Wigner matrices. 

Lemma 7.6 With the assumption of Theorem 3.1, suppose (3.3) holds, we have the estimate 



Em(z) + 



(logAQ g °V^T^ 
" (M V )g*(z) [ '- 6 > 



Em(z) + z 

for some sufficiently large positive constants Co (depending on a, (3 in (2.14)). 

Proof of Lemma 7.6. Recall the definitions of 0°, ftf and fij in (3.17), (3.12), (3.13) and (3.14) and we 
define 

n z ee ft°nftfnftj. (7.35) 

With (3.34), we have 

P(n,) > 1 - CA^ cloglog7V . (7.36) 
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The r.h.s. of (7.34) is larger than N~ 2 . Then with (7.36) and |m(z)| < if 1 < M (see (3.31)), we only need 
to prove 



El(Ct z )m(z) 



El(n z )m(z) + z 



< 



{Mri)g{zY 



Taking the expectation of the self consistent equation (4.11) with (4.12), we obtain that 



= 0. 

3 

For simplicity, we define 

Ai := E[l(fi a ) • Gu] , A;=Y,Ai/N. 

i 

Together with (7.35) and (7.36), we have 

|A-m 8C |, \A -A\ < 1. 
Then, similarly to (5.12), on the event Q z we have 

\z + A\-\'£o? J A j -A\-\T\>C, 

i 

by using \z + m sc \ > 1 and that on the set f2 z , T is small. Therefore, we can expand (7.38) as 



= Ai 



i Y.Ai A i- A ii(n z )Ti 



-o 



z + A (z + A) 2 
I l(fi.) Ej^ijGjj-A 



(z + Ay- 



(z + A) 2 



o 



E[l(» z )|T a | 
(z + A) 3 



(7.37) 



(7.38) 



(7.39) 



Then summing up 1 < i < N, we obtain that 
1 



.4 



z + A 



< C: 



E[i(n,)Ti] 



C maxE 



l(Sl z )\j24 G ii ~ A 1 +CTE[l(f2 z )|T| 2 ]. (7.40) 



Applying (3.12) and the definition of fi z , we can bound the second and third terms in the r.h.s. of (7.40) 
with some constant C as follows, 



A + 



1 



< C max 



E[l(fi 2 )Ti] 



(log N^y/JT+y 
Mr)g(z) 2 



z + A 

If rj > 3, we estimate E[l(f2 z )Tj] as 

E[l(n,)T<] < E[l(fi;)Ti] +Efl([f2°] c )l(f] z )|T l 



With (4.7), we have 



|G«Gj</G«| < 2/t?. 



(7.41) 

(7.42) 
(7.43) 
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Then, with the definition of Tj (4.12) and (4.39), we have 

P(|maxT,| > N c ) < e - ^" 
for some positive constants c and C. Inserting this and (3.20) into (7.42), we have 



z + A 



< 



(log Nfy^+y 
Mr,g 2 (z) 



(7.44) 



(7.45) 



in the case of 77 > 3. If rj < 3, similarly, with (3.23) we have the same result. This proves (7.37) and thus 
completes the proof of Lemma 7.6. q 

Proof of Lemma 7.5. First wc will prove the result for large 77, more precisely we show (7.33) under the 
additional assumption that 

Nr){n + j]) 3/2 > (log N) Cl , (7.46) 

with a sufficiently large constant C\ . 

In the case of the generalized Wigner matrix, (2.6), we have M > (C sup )~ 1 N and 5+ > Ci n f (2.7), then 



g(z) x ^K + rj 

up to an O(l) factor. Note that with a sufficiently large C\, (7.46) implies (3.3) and thus combining Lemma 
7.6 with Lemma 5.2 we obtain (7.33) under the condition that 77 satisfies (7.46). 

To prove (7.33) for any rj > 0, it remains to consider the case when (7.46) does not hold. For a fixed 
E, let rf = r)*(E) > be the (unique) solution of Nti(k + r/) 3 / 2 = (logiV) Cl , i.e. when (7.46) becomes an 
equality. In particular, we know that 



|Em(z*)-m sc (z*)| < 



(logiV) ( 



(N V *)( K + r,*)- 

Consider 7/ < 77*. set z = E + ir], z* = E + it]* and estimate 

rn* 

\Em(z) -m sc (z)\ < \Em(z*) - m sc (z*)\ + / \d y (Em(E + iy) — m sc (E + iy))\dy 



Note that 



\d y m(E + iy)\ =\±Y;d y G ]3 (E + iy) 



<^ E \°^ E + ^)! 2 = ^ E l ™Gn(E + iy) = hmm(E + iy), 



(7.47) 

(7.48) 

(7.49) 
(7.50) 



jk 



and similarly 



\d v m sc (E + iy)\ 



Q sc (x) 



(x — E — iy) 2 



rdx 



< 



gs ° X . 9 dx = -Imm sc (£ + iy). 

\x-E- iyY y 
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Now we use the fact that the functions y —> ylmm(E + iy) and y — > ylmm sc (E + iy) are monotone increasing 
for any y > since both are Stieltjes transforms of a positive measure. Therefore the integral in (7.48) can 
be bounded by 

fV* rv" fty 

/ — [lmEm(£ + iy) + lmm sc (E + iy)] < rf [lmEm(£ + irf) + lmm sc (E + irj*)] / -| (7.51) 

J rj V J rj y 

By the choice of 77* and using that Imm sc (z*) < Cs/k + 77*, we have 

Imm sc (z*) < - ■ (7.52) 

and then ImEm(z*) can be estimated from (7.47). Inserting these estimates into (7.48) and (7.51), and 
using (7.47), we get 

m . , , ,. . , . ... 2(logiV) c rj* (logiV) c 

|Em(z) -m sc (z)| < |Em(z*) - m sc (z*)\ + — — '—— < v 5 ; 



Nt]*(k + rj*) J] Nrj(n + j]) 
with a possible larger C in the r.h.s. This completes the proof of Lemma 7.5. rj 

With Lemma 7.5, it follows that for any E and 77 > 0, 
\n x (E + V )-n x (E- V )\ + \ni(E + V )-n^(E- V )\< V (\ogNf (l + — — — L— — ) . (7.53) 



Nr](\E - 2| + jj) 

Now we return to the main argument to prove (7.12) in Lemma 7.3. Given (7.10), we only need to prove 



f \n x (E)-n x c (E)\dE <CN- 1+e . (7.54) 
J -3 

This inequality follows from the next lemma by choosing the signed measure 

g A (dx) - g sc (dx) - (7.55) 

whose Stieltjes transform is given by 

m A (z) = m sc (z) — Em(z) (7.56) 

and the conditions (7.58) and (7.59) are provided by (7.33) and (7.53). This will complete the proof of 
Lemma 7.3. 

Lemma 7.7 Let g A (dx) be a finite signed measure with support in [—K,K] for some K > 0. Let 

m A (z) := r n A {E) f E g A {dx) (7 57) 

JR X — Z J -00 

be the Stieltjes transform and the distribution function of g A (dx), respectively. Let k x , ke denote \ \x\ — 2| 
and \ \E\ — 2|. We assume that m A satisfies the following bound with some constant C: 

\m A (x + iy)\ < tt^t^j- > for y > 0, \x\<K+l, (7.58) 
(Ny){K x +y) 
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and for any a > 



The 



E+a 



1 



\g A \(dx)<a(logNf 1 + - 
E~a \ Na(n E + a) 



r K 

/ d£ |n A (£)| < CAT^log AT) 
./-if 



(7.59) 
(7.60) 



/or some constant C > w/ien iV is sufficiently large. 



This lemma is similar to Lemma B.l in [16], but with different assumptions. Since the assumptions here 
are stronger than (B.3) and (B.4) in [16], we actually obtain a better bound (7.60) than in [16], where the 
l.h.s. of (7.60) was bounded by AT -6 / 7 . 

Proof of Lemma 7.7. For simplicity, we omit the A superscript in the proof. For a fixed E G [—K,K], 
r] > 0, define a function / = f E . v : K —> M: such that f(x) = 1 for x G [—K,E — rj\, f(x) vanishes for 
x e (-oo,-K - l)n [E + r/,oo), moreover |/'(x)| < Cry" 1 and \ f"(x)\ < Cn" 2 . Then 



n(E) - / f E , v (X)g(X)dX 



E+i) 



: I \g\(dx)<r,(\ogNf + - 
'e-t, V Nt](ke + V) 



1 



(7.61) 



We will choose rj = N 1 and set f E '■= f E -n with n = 1/N. Then to prove (7.60), we only need to prove that 



\E\<K+1 



f E (X)g(X)d\dE 



< AT" 1 (log AT) 



C 



(7.62) 



for some C > 0. 

To express /s(A) in terms of the Stieltjes transform, we use the Hclffcr-Sjostrand functional calculus, as 
(B.12) in [16]. We formulate this result in a more general form. 

Lemma 7.8 Let f E ,t] be given as above with some E G [-K, K], K > 3, and < n < 1/2. Suppose that the 
Stieltjes transform m of the signed measure g satisfies 



\m(x + iy)\ < 



{NyY(K x +y) 

with some exponents < r, a < 1 and some constant L. Then 

CL | log r. 



for y >0, \x\ <K + 1, 



f E (X)g(X)d\ 



< 



N T {K E +riY' 



(7.63) 



(7.64) 



with some constant C depending on K . 



The condition of this lemma with t = a = 1 and L = (logN) c coincides with (7.58), therefore, after 
integrating in E and using 77 = 1/AT, we obtain (7.62) which completes the proof of Lemma 7.7. rj 
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Proof of Lemma 7.8. Analogously to (B.13), (B.14) and (B.15) in [16] we obtain that 



f E (X)g(X)d\ 



< C (\f E (x)\ + \y\\f E ( X )\)\ X '(y)\\m(x + iy)\dxdy 



-C 



+c 



y\<v 



\y\>v 



yfE( x )x(y)^i^{x + iy)dxdy 



yfE( x )x(y)^ mm {x + iy)dxdy 



(7.65) 



where x(y) is a smooth cutoff function with support in [— 1, 1], with x(y) = 1 for \y\ < 1/2 and with bounded 
derivatives. The first term is estimated by 



CL 



(\f E (x)\ + \y\\f' E (x)\)\ X '(y)\\m{x + t y)\dxdy < — 



(7.66) 



using (7.63) and the support of x' ■ 
With (7.63) and \f E \ < CrT 2 and 

su PP /^(.t) C {\x-E\ < r?}, 
the second term in r.h.s. of (7.65) is bounded by 

y\f E {x)\ 



CL 



0<V<V J\x-E\<ri 

(N y y( Kx + y y 



-dsdy 



< 



CL 
N T rf 



(Ka + y)' 



rdxdy 



< 



Here we used that for y < 1/2 we have 



dx < 



CLij 1 - \lo gr] \ 
N t (k e + 

Cr]\logy\ 



(7.67) 



As the (B.17) and (B.19) in [16], we integrate the third term in (7.65) by parts first in x, then in y. Then 
bound it with absolute value by 

C [ 7]\f' E (x)\\Rcm(x+ir])\dx+C [ \f' E (x)x' (y)Rem(x+iy)\+— f [ |Rc m(x+iy)\dxdy. 

J\x\<K+l JR 2 V Jr)<y<l J\x-E\<r] 

(7.68) 

The middle term is bounded as (7.66). With (7.63) again, we have 



(7.68) < 



1 CL CL 

-ax 



CL 

{NriY J\ x - E \<r, («= 
Ciy-^logT?! 



v<y<i 



1 



X - E \<r, (Kx +y) a 



dxdy 



< 



(7.69) 



Then combining (7.65), (7.66), (7.67), (7.68) and (7.69) we obtain (7.64) and complete the proof of 
Lemma 7.8. rj 
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7.3 Proof of Lemma 7.4 



Define the variables 



(7.70) 

Denote by u a and A a the eigenvectors and eigenvalues of H. For any collection of real numbers, C a € K, we 
have 



E 



E 



(i) 



(7.71) 

With the choice C Q = A' 1 , a = j, j+l, • ■ • ,j+K—l, and C Q = otherwise, we get \V\j.x\ 2 < C sup (NK) 1 . 
Using the Bobkov-Gdtze concentration inequality [4] and the uniform bound on the LSI constant (6.14), we 
get 

P(\\ jlK -EX jtK \ > 7 ) < e ^ T Ee CsT2 l VA -l 2 < e -7T+c s c„„r»/(i>rJ0 
for any T and 7. Choosing 7 = N' 1 ^ 5 A"" 1 / 2 and T = (NK) 1 / 2 , we obtain (7.13). Q 



8 Proof of the Green's function comparison theorem 

Proof of Theorem 2.3. From the trivial bound 



Im 



H 



It] 



< I - I Im 

77 



and from (2.21) we have the following a priori bound 



max max max sup 

0<7<7(W) l<fe<W |_E|<2-k, ;>jv -i- 



Im 



H 7 -E±ir) 



< N 3r+e I > 1 — CN~ cl ° sl ° 6N 



(8.1) 



Note that the supremum over r\ can be included by establishing the estimate first for a fine grid of 77's 
with spacing N~ 10 and then extend the bound for all rj by using that the Green's functions arc Lipschitz 
continuous in 77 with a Lipschitz constant t]~ 2 . 

Let A m and u m denote the eigenvalues and eigenvectors of H~ n then by the definition of the Green's 
function, we have 



1 



N 



< \ U m(J)\\Um(k)\ < 



771 — 1 



I A, 



N 

E 



l«m(j)| S 



1/2 



AT 

E 



|u m (fc)l s 



1/2 



Define a dyadic decomposition 

U n = {m : 2"- 1 r J < |A m - A| < 2™r7}, 

J7 = {m : |A m - A| < 77}, (/ M i 
and divide the summation over m into U n U n 



n = 1,2, 



= 1 I A * 



,n := C log N, 



•2) 



iV 

E 



>(j)l 5 



EE 

« m&J n 



|A m — z 



<^E E Im - 



= {m : 2"«77 < |A m - A|}, 
u m {j)\ 2 



n m£U n 



E - i2 n r) 



< 



H 7 -E- i2 n rj / 
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Using the estimate (2.21) for n = 0, 1, . . . , no and a trivial bound of 0(1) for n = oo, we have proved that 

1 



sup sup max sup 

, 0<7<7(W) l<k,£<N \E\<2-k 



Hy-E±iT) 



k( 



< N 



4t+e 



> 1 — CN~ cl ° slogN 



(8.3) 



For simplicity, we will consider the case when the test function F has only n = 1 variable and k± = 1, 
i.e., we consider the trace of a first order monomial; the general case follows analogously. Consider the 
telescoping sum of differences of expectations 



EF | — Tr 



1 



N H( v ) - 



I — Tr— 4 



(8.4) 



7 (JV) 

E 

7=1 



[ — Ti 



1 



| — Tr - 

N H 7 _! 



Let denote the matrix whose matrix elements are zero everywhere except at the position, where 
it is 1, i.e., Ej^p = SikSj£. Fix an 7 > 1 and let be determined by <j){i, j) = 7. We will compare i? 7 _i 
with H 1 . Note that these two matrices differ only in the and matrix elements and they can be 
written as 

ff 7 -i = Q + -^=V, V := VijEM + VfiEW 



1 



N 



W. 



w 



jE^ +w ji E ( - ji \ 



with a matrix Q that has zero matrix element at the and positions and where we set Vji := Vij 
for i < j and similarly for w. Define the Green's functions 



R 



1 



Q-z' 



S 



We first claim that the estimate (8.3) holds for the Green's function R as well. To see this, we have, from 
the resolvent expansion, 

R = S + N~ 1/2 SVS + ... + N~ 9/5 (SV) 9 S + N~ 5 (SV) 10 R. 

Since V has only at most two nonzero element, when computing the (k,£) matrix element of this matrix 
identity, each term is a finite sum involving matrix elements of S or R and Uy, e.g. (SVS)ke = S^VijSji + 
SkjVjiSu. Using the bound (8.3) for the S matrix elements, the subexponential decay for Vj,j and the trivial 
bound \Rij \ < we obtain that the estimate (8.3) holds for R. 

We can now start proving the main result. By the resolvent expansion, 

S = R- N~ 1/2 RVR + N~ 1 (RV) 2 R - N- 3/2 (RV) 3 R + N- 2 (RV) 4 R - N- 5/2 (RV) 5 S, 
so we can write 

1 4 

— TtS = R + Z, £ = y N- m/2 R (m) + N~ 5/2 fl 
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with 



R = jr^rR, B m) = (-l) m — Tr (RV) m R, Q 



N 



Tt(RV) 5 S. 



For each diagonal element in the computation of these traces, the contribution to R, R( m > and fl is a sum 
of a few terms. E.g. 

i?' 2 ' = — ^RkiVijRjjVjiRik + l<i-,r,,H ,,c,,H + RkjVjiRuVijRjk + RkjVjiRijVjiRik 



and similar formulas hold for the other terms. 
Then we have 



EF 



Tr- 



1 



N H~, 



--EF [R + Z] 
--E [F(R) + F'(R)Z + F"(R)£ 2 

5 

= N- m/2 EA {m \ 



F^(R + C)^ 



•5) 



where £' is a number between and £ and it depends on R and £; the A^'s are defined as 

= F(R), A« = F'{R)RW, A™ = F"(R)(R^) 2 + F'(R)R^, 
and similarly for A^ 3 ) and A^ 4 ) . Finally, 

A (5) = F >(R)Q + i?(5) + ^(tfW) 5 + . . . . 

The expectation values of the terms A^ m \ m < 4, with respect to are determined by the first four 
moments of Vij , for example 



EA& 



[ jV 51 RkiR 33 Rl 
k 

^'(R^^R^iR 



k + 



El Mi 



e^ 



r l 



F"( R ) RkiRjiRijRi 



k.t 



E \vi 



r l 



F"( R ) J^2 5Z RkiRjlRiiRjk + 



k.i: 



Ev 



Note that the coefficients involve up to four derivatives of F and normalized sums of matrix elements of R. 
Using the estimate (8.3) for R and the derivative bounds (2.23) for the typical values of R, we see that all 
these coefficients are bounded by N c ( T+e ' with a very large probability, where C is an explicit constant. 
We use the bound (2.24) for the extreme values of R but this event has a very small probability by (8.3). 
Therefore, the coefficients of the moments Ev^vfj, u + s < 4, in the quantities A^-°\ . . . , A^ are essentially 
bounded, modulo a factor N c ( T+£ \ Notice that the fourth moment of appears only in the m = 4 term 
that already has a prefactor N~ 2 in (8.5). Therefore, to compute the m < 4 terms in (8.5) up to a precision 
o(N~ 2 ), it is sufficient to know the first three moments of exactly and the fourth moment only with a 
precision N- s ; if t and e are chosen such that C(r + e) < 5, then the discrepancy in the fourth moment is 
irrelevant. 
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Finally, we have to estimate the error term A^>. All terms without Q can be dealt with as before; after 
estimating the derivatives of F by N c ^ T+e \ one can perform the expectation with respect to Vij that is 
independent of M m \ For the terms involving f2 one can argue similarly, by appealing to the fact that the 
matrix elements of S are also essentially bounded by N c ^ T+£ \ see (8.3), and that has subexponential 
decay. Alternatively, one can use Holder inequality to decouple S from the rest and use (8.3) directly, for 
example: 

E\F'(R)n\ = ^E|F'OR)Tr(i?V0 5 S*| < [E(F'(F)) 2 Tr S 2 ] ^ [ETr (RV) 5 (VR*) 5 ] 1/2 < CN ci - T+e) . 

Note that exactly the same perturbation expansion holds for the resolvent of F 7 _i, just is replaced 
with Wij everywhere. By the moment matching condition, the expectation values EA^ m ^ of terms for m < 3 
in (8.5) are identical and the m = 4 term differs by N~ s+c ( T+e \ Choosing r = e, we have 



EF ( — Tr- 



N H„ 



-EF [ — Tr 



1 



N F 7 _i 



< CN~ 5 / 2 + Cs + CN- 2 - 



■6+Ce 



After summing up in (8.4) we have thus proved that 



1 



N #(«) - 



-EF | — Tr 



N #(«0 - 



< CN~ 1/2+Ce + CN~ S+C£ . 



The proof can be easily generalized to functions of several variables. This concludes the proof of Theorem 
2.3. n 



Proof of Theorem 6.4- Define an approximate delta function (times 7r) at the scale rj by 

1 



} (x) = Im ■ 



X — IT] 

For notational simplicity, we will prove only the case of three point correlation functions; the proof is 
analogous for the general case. By definition of the correlation function, for any fixed F, ai, a 2 , ol 3 , 



E v 



1 



N{N-l){N-2)^ k " 



E 



A, - F 



N 



A, 



o 2 
TV 



Afc - F 



= J dx 1 dx 2 dx 3 p^ N (x 1} x 2 ,x 3 )9 v (x 1 - E 1 )9, 1 (x 2 - E 2 )9 v (x 3 - F 3 ), 
By the exclusion-inclusion principle, 

Kw jy(JV _ 1 1 )(JV _ 2) v (x 1 -E 1 )6 v (x 2 -E 2 )9 v (x 3 -E 3 )=E w A 1 



03 

N 



Ej :=E+-?-. 

3 N 



,A 3l 



(8.6) 



(8.7) 



where 



Al : N(N- l)(iV-2) A 



A 



3 - 



N(N-l)(N-2) 



— oT E — Ei)d n (Xi — E 2 )9 71 (X i — F 3 ) 
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and 

A 2 :=B 1+ B 2 + B 3 , with B 3 = -^-1^—^ £ e v (Xi - E 1 )e v {\ i - E 2 ) £ d v (X k - E 3 ), 

and similarly, B\ consists of terms with j — fc, while B 2 consists of terms with i = k. 
Notice that, modulo a trivial change in the prcfactor, E w Ai can be approximated by 

E W F | 4lm Tr — , . . . , — Im Tr 



N ffM - V " ' N Hi v ) - z 3 

where the function F is chosen to be F{x\,X2, x 3 ) := X\X 2 xj, if max,,- \xj\ < N E and it is smoothly cutoff to 
go to zero in the regime maxj \xj\ > N 2e . The difference between the expectation of F and A\ is negligible, 
since it comes from the regime where N e < maxj -^-|ImTr — Zj) _1 | < N 2 , which has an exponentially 

small probability by (8.3) (the upper bound on the Green's function always holds since rj > N~ 2 ). Here the 
arguments of F are imaginary parts of the trace of the Green's function, but this type of function is allowed 
when applying Theorem 2.3, since 

ImTrG(2) = -[TrG(z) - TrG(z)]. 

We remark that the main assumption (2.21) for Theorem 2.3 is satisfied by using (2.17) of Theorem 2.1 with 
the choice of M x N. 

Similarly, we can approximate E w i?3 by 

E wG [ -^Tr I Im —-^ Im —-^ 1 , -^-Im Tr ■ 



^iV 2 [ HM-zi H( v )-z 2 j'N HW-zz. 

where G(xi, x 2 ) = x\x 2 with an appropriate cutoff for large arguments, and there are similar expressions for 
Bi,B 2 and also for A3, the latter involving the trace of the product of three resolvents. By Theorem 2.3, 
these expectations w.r.t. w in the approximations of E w ^4i can be replaced by expectations w.r.t. v with 
only negligible errors provided that rj > N^ 1 ^ 5 . We have thus proved that 



lim / dx 1 dx 2 dx 3 [p^' N (xi, x 2 , x 3 ) - p ( v 3 ' N (xi, x 2 , x 3 )l O v (xi - E 1 )6Jx 2 - E 2 )6Jx 3 - E 3 ) = 0. 

N — >oo ' 



J 3 ) t„ „ „ ^ J 3 ) 

-^3 [f 

Set rj = A r_1_£ for the rest of the proof. We now show that the validity of (8.8) for any choice of E, 
ai,a 2 ,a 3 (recall Ej = E + atj/N) implies that the rescaled correlation functions, p w N (E + fti/N, . . . ,E + 

Pz/N) and p v N (E + fli/N, . . . , E + f3 3 /N). as functions of the variables /3 2 , (3 3 , have the same weak limit. 
Let O be a smooth, compactly supported test function and let 

O v {Px, fc, Pz) ■= rJ^Kz J 3 da£ 1 da2dazO(a 1 ,a 2 ,az)Q n y — ■• A (^J^~^j 

be its smoothing on scale Nrj. Then we can write 

= J dftdftdft O ri {Pi^2,h)pt] N (e + §>••• ,E + ?f) 
+ dftd&dft (O - O v ){px, /9a, P 3 ) P % (e + ^,...,E + ^J. (8.9) 



4G 



The first term on the right side, after the change of variables Xj = E + j3j/N, is equal to 

daida 2 da 3 0(ai,a 2 ,a 3 ) / dx 1 dx 2 dx 3 p ( ^' N {x 1 , x 2l x 3 )O v {xi - Ei)9 n (x 2 - E 2 )9 n {x 3 - E 3 ), (8.10) 

JR 3 

i.e., it can be written as an integral of expressions of the form (8.8) for which limits with p w _M and p v> N 
coincide. 

Finally, the second term on the right hand side of (8.9) is negligible. To sec this, notice that for any test 
function Q, we have 



,E- 



N 



I dx 1 dx 2 dx 3 Q(N(x 1 -E),N(x 2 -E),N(x 3 -E))p^ ) N (xi,X2,x 3 ) 

R 3 

-— Q(N(\i-E),N(\ j -E),N(\ k -E)). (8.11) 



N 



N 



If the test function Q were supported on a ball of size N £ , e' > 0, then this last term were bounded by 



||Q|jooE w ^ € ,(E) < CHQIIooJV 



4?' 



5.12) 



Here 'N T (E) denotes the number of eigenvalues in the interval [E — r, E + r] and in the estimate we used the 
local semicircle law on intervals of size r > N~ 1+£ . 

Set now Q := O — O v . From the definition of O n , it is easy to see that the function 

3 

Qx {fix , h , As) - 0(Pi , h , h) - O v (ft , (3 2 , ft) n 1( \Pi I < N e ' ) 

3=X 

satisfies the bound ||Qi||oo < IIQIloo = \\0 — O v \\oo < CNrj = CN~ e . So choosing e' < e/4, the contribution 
of Qi is negligible. Finally, Q 2 = Q — Qi is given by 



and 



Qi{Pi,h,fo) = -O v {Puh,fo) 



|Q 2 | <c 



1 + 



l + A 2 



<C{ N- 



7V 2e ' + (3 2 



1 + 



3=1 



{l(|/?i|>A r£ ') + ...} 
1 



(8.13) 



Hence the contribution of Q 2 in the last term of (8.11) is bounded by 



i,j,k 



N- 



N -2+2e' + ( A . _ E)2 



N- 2 + {Xj - Ef 



N- 



N- 2 + (Afe - Ef 



From Theorem 2.1, the last term is bounded by N £ up to some logarithmic factor. This completes the 
proof of Theorem 6.4. rj 
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A Spectral condition for band matrices 

Lemma A.l Let B = (<7jj) satisfying (2.1) and (2.8) with W > 1 and with f being a nonnegative symmetric 
function with J f = 1 and f £ L°°(R). Then we have 

B>-l + 5 (A.l) 

for some 6 > and W large enough, depending on f . 

Proof Recall that the discrete Fourier transform in d — 1 dimensions is defined as follows. Let e := 1/N 
and 

A £ := A = eZ/Z 

be the periodic one dimensional lattice (torus) of size 1 and spacing e with its dual lattice being 

A; := A* := (27rZ)/(^z). 
Let ^ be a function on A. Then its Fourier transform Sj^ip is a function on A* defined as 

and it is an isomctry 

xeA peA* 

In our case, x = k/N and define 

F w (x) := NW- x f{xN/W). 

Then, for p £ A* , we have 

TV 

(? N F w )(p) = ]T F w (x)e-^ = J2 W- l f{k/W)e- i{ - w ^ N ^ k ^ = /(g) + o(l), q = Wp/N, 

xeA k=l 

where the error term vanishes as W — > oo and / denotes the usual Fourier transform in L (K) 

f(q) = J f(y)e~ im dy. 

With this formula, and with the notation V'iv(i) '■= for any ip defined on A, we have 

N , 

(Bi> N )(k)=eJ2NW- 1 f((k-£)/WW N (£)=eJ2F w (--y)^(y) = ]T e**/ w 3^%(p)^(p) 

1=1 yeA p€A* 

and 

N 

pSA* j=l 
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which is normalized to be 1. Hence B on the Fourier side acts as a multiplication by the function £Fjv-PV, so 

SpecB = Range JnFw C supp / + o(l). 
Since / is nonnegative, symmetric function and J f = 1, we have / is real and 

inf / > -1 + 6 

for some 5 > 0, which completes the proof. 



B Large deviation estimates 

In this Appendix we prove two large deviations results. They are weaker than the corresponding results 
of Hanson and Wright [22], used in [14], but they require only independent, not necessarily identically 
distributed random variables, moreover the proofs are much simpler. 

Lemma B.l Let a (I < i < N) be N independent complex random variables with mean zero, variance a 2 
and uniform subexponential decay, i.e., there exist a, (3 > that for any x > 



Pflciil > x a ) < fie- 
Then for any Ai € C (1 < i < N ) and D > 1 we have, 



< C cxp ( - cD 2 +° 



(B.l) 



(B.2) 



for some positive constants C and c depending on a and j3 in (B.l). 

Proof of Lemma B.l. Without loss of generality, we may assume that a = 1. The assumption (B.l) implies 
that the k— th moment of ai is bounded by: 



E\a,\ k < (Ck) 



for some C > depending on a and /3. 
First, for p £ N, we estimate 



N 



y j a>iAi 



With the Marcinkicwicz-Zygmund inequality, for an integer p > 2, we have 



< (Cp) p/2 E 



^2 \a-iAi 



p/i 



(for the estimate of the constant, see e.g. Exercise 2.2.30 of [30]). Using (B.3), we have Ela^a 
{Cp) ap . Inserting it into (B.5), we obtain 



(B.3) 



(B.4) 



(B.5) 



»2 " " a ip/2 I — 



E 



p/2 



(B.6) 
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which implies (B.2) by choosing an even integer p of the order (D/Ce) 2 +° and applying a high moment 
Markov inequality rj 



Lemma B.2 Let a% (1 < i < N) be N independent random complex variables with mean zero, variance a 1 
and having the uniform subexponential decay (B.l). Let Bij £ <C (1 < i,j < N). Then we have that 



N 



N 



y] oiBuoi - e 



1/2 



>£o- 2 (£|£ ri | 2 ) I <Cexp(-c^TT^) 



(B.7) 



>Da 2 (J2\ B v\ 



1/2 



< Cexp ( - cD 5 ^^) 
/or some positive constants C and c depending on a and ft in (B.l). 



(B. 



Proof of Lemma B.2. Without loss of generality we may again assume that a = 1. First, we prove (B.7). 
Notice that | | 2 — 1 (1 < i < N) are independent random variables with mean and variance less than some 
constants C. Furthermore, the fc-th moment of laA 2 — 1 is bounded as 



E(H 2 - i) fe < {Ck) 2ak . 

Then following the proof of the Lemma B.l with |ai| 2 — 1 replacing cti, we obtain (B.7). 
Next, we prove (B.8). For any p £ N, p > 2, we estimate 



(B.9) 







P 




E 


E a&i 


= E 


^diBijOj 




i 







(B.10) 



where & := J2j<i Bij a j- Note that 04 and ^ arc independent for any fixed i. By the definition, 



(B.ll) 



i=l 



is martingale. Using the Burkholder inequality, we have that 



E 



E ai & 



< (Cp) 3p/2 E 



P/2 



(B.12) 



(for the constant, see Section VII. 3 of [28]). By the generalized Minkowski inequality, by the independence 
of di and £j and using (B.3), we have 



(£i*&i 9 r 

i 


2/p 




2/p 


^E 

i 


E|a 4 ^r 


=E 

2 



E(|a*r)E(|&| p ) 



l2/p 



< 



(c P ) 2q E 



E(|6| p ) 



2/p 
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Using (B.6), we have 

E(ie,r)<(c^ + T(Ei^. 

Combining this with (B.12) we obtain 



p/2 



E 



p/2 



(B.13) 



i 

Then choosing (D/Ce) 2 < 1+Q > and applying Markov inequality, we obtain (B.8) . rj 

In our applications we will need these two lemmas when D is a power of \ogN. For simplicity, we do 
not want to keep track of the precise powers in the estimate and we are interested only in error bounds that 
decay faster than any fixed power of N, say CN~ lo s lo s N . Therefore, in this paper we will use the following 
weaker form of these two lemmas, the stronger form will be useful in future applications. 

Corollary B.3 Let ai (I < i < N) be N independent random complex variables with mean zero, variance 
a 2 and having the uniform subexponential decay (B.l). Let Ai, Bij G C (1 < i,j < N). Then we have that 



N 



^aiA 

»=i 



<CN~ log log N 



N 



i=l 



ajBijCij 



>(logiV)l+v(El^| 2 ) 1/2 } 

N 1/2 1 

> (\ogN)i+ 2a o- 2 (j2\ B n\ 2 ) \ <CiV- loglogJV , 

i=l J 

> (logiV) 3 + 2a <T 2 (El%| 2 ) 1/2 | <CN~ loelosN , 

i±i J 



(B.14) 
(B.15) 
(B.16) 



for some constants C depending on a and /3 in (B.l). 



C Proof of Lemma 6.5 

We first prove a version of this lemma when the fourth moment exactly matches, i.e., 7 = 0, then we explain 
how to deal with the approximation. More precisely, we first show the following: 

Lemma C.l Under the condition 

m 4 -m§-l>Ci, m 4 < C 2 (C.l) 

for some positive constants C± and C2, there exists a real random variable £ such that the first four moments 
of £ are 0, 1, 777,3 a- n d o-nd the distribution v of ^ satisfies logarithmic Sobolev inequality and the LSI 
constant is bounded from above by a function of C\ and Ci . Moreover, v can be chosen to be absolutely 
continuous with a smooth positive density, u(dx) = e^ u ^dx, such that the derivatives of U satisfy 

\U^(x)\<C k {l + x 2 ) ck (C.2) 

with some fixed constant C and k- dependent constants C k - 
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Remark. The last statement about the smoothness of U will not be needed in this paper, but we state it 
for further reference. 



Proof. We start with the case |?77 3 | > 8, where 5 is small enough number to depend only on Ci, sec below. 
Let £ be the sum of two Gaussians, with density function of the form 



/e(z) 



1 



(a + b) y/2na 



e -(x-a) 2 /(2a) + 



1 



(a + b) y/2na 



D -(x+b) 2 /(2<r) 



(C.3) 



with some parameters a > 0, b > 0, a > 0. If the first 4 moments of fe(x) are 0, 1, m 3 and 7714, then we 
have the relations 



7774 



1 



nr, 



1 - cr 

cib =1 — (j and a — & = 



4cr - 2cr 2 , 
m 3 



1 



(C.4) 
(C.5) 



With m 3 , 7774 in (C.l) and |m 3 | > 8, one can always find a solution of (C.4) such that < a < 1. Actually, 
one can see that c < cr < C, where c and C only depend on C\, C2 in (6.19) and 8. 

Once cr is found, it is easy to check that one can always find real solutions a, b for (C.5) as long as m 3 , 7724 
satisfy (C.l) and |m3| > 8. Since the solutions a, &, cr are continuous with respect to m 3 and 7774, then they 
are uniformly bounded. Distributions of the form (C.3) satisfy the LSI, since the are log concave away from 
a compact set. Since the parameters a, b, a are in a compact set, the LSI constant will remain uniformly 
bounded with a bound depending on C\, C2 and 8. It is clear that the density function (C.3) is positive and 
its logarithm satisfies (C.2). 

Now we consider the case that |m 3 | < 8 with a small 8 = min{ 1 , C\ } , where C\ is the constant in 
(C.l). Without loss of generality, we may assume to 3 > 0. We consider the following three parameter family 
of probability densities 

fd,p,e( x ) =i 1 - £ )9dA x ) +eh(x) 

with 



9d,{s( x ) 



201P+ 1 



\xf ■ l(\x\ < d), h{x) = 



1 



(a + b) V2t7 



3 -(x-a) 2 /2 + 



1 



(a + b) V2t7 



3 -(z+b) 2 /2 



where the parameters are in the range — l</3<oo,0<(i<oo,0<£^Cl and a, b will be chosen explicitly. 
Simple calculation shows that the moments of fd,p,e are mi = 0, 



777 2 =(1 - £ 



(3 + 1 



73 + 3 

m 3 =eab(a — b), 



d 2 + e(l + ab), 



7774 =(1 — e 



/3+1 
1 

(3 + 5 



3 + ab(6 + a 2 + b 2 - ab) 



(C6) 
(C.7) 
(C.8) 



Choosing, say, a = 2, b = 1, and setting 7772 = 1, we obtain d 2 = \ _ 3 £ £ ^| from the first equation, e = m 3 /2 



from the second equation and finally the last equation becomes 

(l-3m 3 /2) 2 (/3 + 3) 2 



l-e P+l 

23 



7774 



l- 777 3 /2 08+ 1)03 + 5) 



-777 3 . 



(C.9) 



52 



Recall that we are in the regime where \m^,\ < S < Ci/100. For any fixed < to 3 < 6, the right hand 
side of (C.9) is a monotonically decreasing function in ft G (— l,oo) whose value goes down from oo to 

^l-m'J/f 2 + T m 3 - 1 + 20S - But we know from (C- 1 ) that ^2 > m 4 > 1 + 1005, thus there is a value 
j3 such that (C.9) holds, moreover, /3 is in a compact subinterval of (— l,oo) that depends only on S and 
C*2 . It is then easy to check that the support and the supremum norm of the density g^^ also remains in a 
compact set, depending only on S. Therefore we constructed a probability measure with the given moments, 
that is a linear combination of two Gaussians plus a compactly supported piece with a nonncgative bounded 
density. To ensure smoothness, we replace gd,p with gd,/3.r '■= $t * gd.p, where i9 T (x) = t~ 1 ?9(.t/t) and d 
is a compactly supported nonncgative smooth symmetric function with J i? = 1. The first moment mi is 
unchanged and the formulas (C.6) for the higher moments will get modified by an error term of order r. Let 
r be much smaller than all other parameters in this proof. It is easy to see that, by a simple calculation 
treating r as a small perturbation, one can still choose o, b, e and /3 in the previous argument to match 
TO2 = l,m3 and m<4. 

Finally, note that the sum of two Gaussians satisfy the LSI, as well as its compact perturbation and the 
new LSI constant depends only on the supremum norm of the density of the perturbation. Since all these 
parameters remain uniformly controlled by C\ and C2, we proved Lemma C.l, i.e., Lemma 6.5 for 7 = 0. rj 

Now consider the case 7 > 0. For any real random variable (, independent of £ G , and with the first 4 
moments being 0, 1, m^^) and < 00, the first 4 moments of 

C' = (i- 7 ) 1/2 C + 7 1/2 £ G (CIO) 

are 0, 1, 

m 3 (C') = (l-7) 3/2 ^3(C) (Cll) 

and 

m 4 (C') - (1 - 7) 2 ™ 4 (C) + 67 - 3 7 2 - (C.12) 

Given m 3 and 7714, satisfying (C.l) and using Lemma C.l, we obtain that for any 7 small enough, there 
exists a real random variable £ 7 such that the first four moments are 0, 1, 

™ 3 (£ 7 ) = (l-7r 3/2 ™ 3 (C.13) 

and 

m 4 (£ 7 ) = m 3 (£ 7 ) 2 + (m 4 - mj). 
With TO4 < C2, we have to§ < C2, thus 

|m 4 (£ 7 ) - m 4 | < C7 (C.14) 

for some C depending on C2. 

Hence with (Cll) and (C.12), we obtain that f = (1 -7) 1/2 £ 7 +7 1/2 £ G satisfies m 3 (£') = m 3 and (6.21). 
With Lemma C.l, we obtain that the LSI constant of £ 7 is bounded by a constant only depends on C\ and 
C2, which completes the proof of Lemma 6.5. rj 
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